{"title": "Prior-Free Dynamic Auctions with Low Regret Buyers", "book": "Advances in Neural Information Processing Systems", "page_first": 4803, "page_last": 4813, "abstract": "We study the problem of how to repeatedly sell to a buyer running a no-regret, mean-based algorithm. Previous work [Braverman et al., 2018] shows that it is possible to design effective mechanisms in such a setting that extract almost all of the economic surplus, but these mechanisms require the buyer's values each round to be drawn independently and identically from a fixed distribution. In this work, we do away with this assumption and consider the prior-free setting where the buyer's value each round is chosen adversarially (possibly adaptively). \n\nWe show that even in this prior-free setting, it is possible to extract a $(1-\\varepsilon)$-approximation of the full economic surplus for any $\\varepsilon > 0$. The number of options offered to a buyer in any round scales independently of the number of rounds $T$ and polynomially in $\\varepsilon$. We show that this is optimal up to a polynomial factor; any mechanism achieving this approximation factor, even when values are drawn stochastically, requires at least $\\Omega(1/\\varepsilon)$ options.\n\nFinally, we examine what is possible when we constrain our mechanism to a natural auction format where overbidding is dominated. Braverman et al. [2018] show that even when values are drawn from a known stochastic distribution supported on $[1/H, 1]$, it is impossible in general to extract more than $O(\\log\\log H / \\log H)$ of the economic surplus. We show how to achieve the same approximation factor in the prior-independent setting (where the distribution is unknown to the seller), and an approximation factor of $O(1 / \\log H)$ in the prior-free setting (where the values are chosen adversarially).", "full_text": "Prior-Free Dynamic Auctions with Low Regret\n\nBuyers\n\nYuan Deng\n\nDuke University\n\nJon Schneider\nGoogle Research\n\nBalasubramanian Sivan\n\nGoogle Research\n\nericdy@cs.duke.edu\n\njschnei@google.com\n\nbalusivan@google.com\n\nAbstract\n\nWe study the problem of how to repeatedly sell to a buyer running a no-regret,\nmean-based algorithm. Previous work [Braverman et al., 2018] shows that it is\npossible to design effective mechanisms in such a setting that extract almost all\nof the economic surplus, but these mechanisms require the buyer\u2019s values each\nround to be drawn independently and identically from a \ufb01xed distribution. In this\nwork, we do away with this assumption and consider the prior-free setting where\nthe buyer\u2019s value each round is chosen adversarially (possibly adaptively).\nWe show that even in this prior-free setting, it is possible to extract a (1 \u2212 \u03b5)-\napproximation of the full economic surplus for any \u03b5 > 0. The number of options\noffered to a buyer in any round scales independently of the number of rounds T\nand polynomially in \u03b5. We show that this is optimal up to a polynomial factor;\nany mechanism achieving this approximation factor, even when values are drawn\nstochastically, requires at least \u2126(1/\u03b5) options. Finally, we examine what is\npossible when we constrain our mechanism to a natural auction format where\noverbidding is dominated. Braverman et al. [2018] show that even when values are\ndrawn from a known stochastic distribution supported on [1/H, 1], it is impossible\nin general to extract more than O(log log H/ log H) of the economic surplus. We\nshow how to achieve the same approximation factor in the prior-independent setting\n(where the distribution is unknown to the seller), and an approximation factor of\nO(1/ log H) in the prior-free setting (where the values are chosen adversarially).\n\n1\n\nIntroduction\n\nRevenue optimal auction design in settings where a seller interacts repeatedly with a buyer (like in\nthe sale of Internet ads) is a problem of high commercial relevance. The promise of dynamic auctions,\nthat allow the linking of buyers\u2019 decisions across time, is the signi\ufb01cantly higher revenue they can\nachieve over running independent/decoupled auctions across time. The technical challenges that\ndynamic auctions introduce, along with their practical impact has inspired a lot of recent work in this\narea [Papadimitriou et al., 2016, Ashlagi et al., 2016, Mirrokni et al., 2018].\n\nTraditionally, almost all work in dynamic mechanism design operates in the regime where the\nplayers\u2019 types (e.g. bidders\u2019 values) are drawn stochastically from a \ufb01xed distribution. In many\nsituations this is far from a realistic assumption \u2013 for example, if the values of a buyer are modelled\nas a distribution, this underlying distribution likely drifts over time and is also subject to shocks\ndetermined by uncontrolled exogenous events. But this assumption is also in many ways critical: in a\ndynamic mechanism in an adversarial setting, a fully rational buyer (who cares about the effect of his\ncurrent action on his future utility) would be unable to compute his future utility at any point of time\nin the game and thus unable to meaningfully best-respond.\n\nOn the other hand, auctions for digital ads have become increasingly more complex over time. The\ndesign space of dynamic auctions, in which a buyer bids on many items over the course of many\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\frounds, is very rich and has room for exceedingly complex auctions. A bidder may have dif\ufb01culty\nbehaving fully rationally in such an auction: the bidder may not have accurate priors for bidders,\nthe bidder may not completely understand the mechanism, and \ufb01nding an equilibrium might be\ncomputationally hard. Instead of acting fully rationally, a bidder might instead choose to try to learn\nhow to bid over time, for example by using a no-regret learning algorithm. Recently, several streams\nof work (e.g. Agrawal et al. [2018], Braverman et al. [2018]) have explored the problem of how\nto design dynamic auctions for such bidders. In all cases these works assume, as is standard, that\nbidders\u2019 values are stochastically generated. However, one intriguing feature of modelling a bidder\nas a learning agent is that it no longer restricts us to the stochastic setting \u2013 the actions taken by a\nlearning algorithm are perfectly well-de\ufb01ned in (and ostensibly even designed for) the prior-free\nsetting where values are drawn adversarially. This opens a wealth of questions of how to robustly\ndesign dynamic mechanisms that perform well in the worst-case against some class of learning agents.\nIn this paper, we explore this question for one of the simplest problems in dynamic mechanism design:\nrepeatedly selling a single item to a single buyer for T rounds.\n\nWe build off the setting of [Braverman et al., 2018], where they model the buyer as a learner running\na mean-based low-regret algorithm. Intuitively, mean-based algorithms prefer to select actions that\nhave performed historically well on average (it can be shown that many classic learning algorithms,\nlike EXP3, Multiplicative Weights, and Follow-the-Perturbed-Leader, are all mean-based low-regret\nalgorithms). In [Braverman et al., 2018], the authors show that surprisingly, when the buyer\u2019s values\nvt \u2208 [0, 1] are drawn from a \ufb01xed distribution, it is possible to design a simple mechanism that obtains\n\nalmost the full economic surplus (i.e., Val = E[Pt vt]) as revenue. Their mechanism, however, relies\n\ncrucially on the fact that the buyer\u2019s values are drawn from the same distribution every round. In\nparticular, it is straightforward to verify that there exist sequences of values for the buyer that result\nin this mechanism receiving asymptotically zero total revenue.\n\nIn this paper we design mechanisms for this problem in the prior-free setting, when the buyer\u2019s values\nvt \u2208 [0, 1] are chosen adversarially (possibly adaptively). In the course of doing this, we aim to\nminimize the complexity of our mechanisms, measured in terms of the number of distinct options (i.e.\n\u201cbids\u201d) the mechanism presents to the bidder in any round. We call this quantity the option-complexity\nof the mechanism. Note that in mechanisms with high option-complexity it becomes harder to learn\nhow to bid. If the option-complexity of the mechanism begins to scale with the number of rounds T ,\nthis may even nullify any sort of low-regret or mean-based guarantee the learning algorithm has (it\nmay not even be possible to explore all potential options).\n\nUpper bound in the adversarial setting: We design a non-adaptive (i.e., does not use histor-\nical bids/allocation/prices) option-based mechanism that yields a revenue of Val \u2212 O(\u03b5T ) with\n\n\u03b53 (cid:17) options, where the instance (v1, \u00b7 \u00b7 \u00b7 , vT ) is chosen by a (possibly adaptive) adversary\n\nO(cid:16) ln(1/\u03b5)\nand Val is the total economic surplus de\ufb01ned by Val = PT\n\nt=1 vt.\n\nLower bound in the stochastic (and hence adversarial) setting: We show that even if values are\ndrawn from an unknown stochastic distribution (i.e. in every round the buyer\u2019s value was drawn\nindependently from some distribution D), any non-adaptive option-based mechanism needs to offer\nat least \u2126(1/\u03b5) options to attain a Val \u2212 O(\u03b5T ) revenue. This implies the option-complexity of our\nalgorithm is tight up to a polynomial factor in 1/\u03b5.\n\nUpper bound in the stochastic setting with unknown distribution via critical mechanisms: Fi-\nnally, although our mechanisms have relatively low option-complexity, they can still appear unnatural\nand complex. We examine what is possible by further restricting our mechanisms to critical mecha-\nnisms [Braverman et al., 2018], by imposing the desiderata of individual rationality, monotonicity of\nprice and allocation in bid, and overbidding being dominated (see Section 2). Braverman et al. [2018]\nshow that the seller can use a critical mechanism to extract a good revenue but not all of surplus,\nin particular showing the seller can always guarantee revenue equal to an O( log log H\nlog H ) fraction of\ntotal economic surplus when buyer values lie in the interval [ 1\nH , 1], and that this competitive ratio\nis tight. This critical mechanism requires full knowledge of the value distribution D. We design a\ncritical mechanism that achieves this same approximation factor, but in a prior-independent setting\nwhere the distribution D is unknown. In addition, we show that it is possible to achieve a slightly\n\n2\n\n\fworse competitive ratio of O(\nprior-free mechanisms for the single-shot instance of this problem.\n\nlog H ) in the prior-free (adversarial values) setting by adapting existing\n\n1\n\nWe emphasize that all the mechanisms we present are non-adaptive (i.e. allocation and payment rules\nat all times are \ufb01xed starting at the beginning of the protocol, and are not functions of the historical\nbids/allocations/payments) as in [Braverman et al., 2018].\n\n1.1 Related Work\n\nOur work is closely related to the dynamic mechanism design literature, such as [Balseiro et al.,\n2017, Liu and Psomas, 2017, Agrawal et al., 2018, Mirrokni et al., 2018, Balseiro et al., 2019], which\nstudies how to sell items online to a \ufb01xed set of strategic buyers, whose valuations are \ufb01xed or drawn\nfrom some distributions. However, the buyers are fully strategic such that their bidding strategies aim\nto maximize their accumulative utility throughout the auction.\n\nNo-regret algorithms were \ufb01rst introduced in the context of the multi-armed bandit problem and have\nbeen widely studied (see Bubeck et al. [2012] for a survey). Applications of low-regret learning to\nalgorithmic game theory are widespread (e.g. [Roughgarden, 2012, Syrgkanis and Tardos, 2013,\nNekipelov et al., 2015, Daskalakis and Syrgkanis, 2016]). Most applications to dynamic auction\ndesign are from the perspective of seller attempting to learn the optimal auction against strategic\nbuyers [Amin et al., 2013, 2014, Cole and Roughgarden, 2014, Morgenstern and Roughgarden, 2015,\nDevanur et al., 2016, Morgenstern and Roughgarden, 2016, Gonczarowski and Nisan, 2017, Cai and\nDaskalakis, 2017, Dud\u00edk et al., 2017, Drutsa, 2017, 2018, Liu et al., 2018]. Recent study takes the\nperspective of buyers and applies learning algorithms to help them learn how to bid in repeated and\ndynamic auctions [Feng et al., 2018, Balseiro et al., 2018].\n\nIn contrast to these works, we take the perspective of the sellers to design online auctions against\nthe buyers who are running no-regret algorithms in bidding. As pointed out in a seminal empirical\nwork [Nekipelov et al., 2015], bidders\u2019 behavior on Bing is largely consistent with a no-regret learning\nalgorithm, which motivates a question of designing a dynamic mechanism against such a no-regret\nlearning behavior. Braverman et al. [2018] initiated the study of mechanism design against a no-regret\nbuyer when the buyer\u2019s valuations are drawn from a \ufb01xed and known distribution. In contrast to their\nworks, we design mechanisms against a no-regret buyer in a prior-free / prior-independent setting.\n\n2 Model and Preliminaries\n\nOur setting is similar to the setting considered in [Braverman et al., 2018]: we consider a multiple\nround auction where every round a seller attempts to sell an item to a buyer running a low-regret (in\nfact, mean-based) algorithm to learn how to bid.\n\nSpeci\ufb01cally, we consider a T -round auction with one buyer and one seller. In each round t, there is\none item for sale. At the beginning of this round, the buyer learns his private valuation vt \u2208 V \u2286 [0, 1]\nfor this item. These valuations vt can be generated in one of two ways: (1) Adversarial, where vt is\nchosen arbitrarily by a (possibly adaptive) adversary; and (2) Stochastic, where vt is independently\ndrawn from some distribution D. This distribution D may either be known to the seller or not and we\nwill mostly consider the case where D is unknown to the seller (i.e., the prior-independent setting).\n\nFor simplicity, we assume the values vt belong to a \ufb01nite set V. This is solely for the purpose of\nproviding a \ufb01nite number of different contexts to the buyer\u2019s learning algorithm and otherwise does\nnot affect our mechanism at all.\n\nTo measure the performance of our mechanisms, we compare the revenue extracted from the mecha-\nnism to the welfare, the total value the buyer assigns to all the items.\n\nDe\ufb01nition 1. The welfare Val(v1, \u00b7 \u00b7 \u00b7 , vT ) is equal to PT\n\nt=1 vt.\n\nThe welfare clearly provides an upper bound on the revenue of our mechanisms. In cases where vt is\ndrawn from some distribution D, we will write Val(D) = Ex\u223cD[x] \u00b7 T to denote the expected welfare\nunder this distribution.\n\n3\n\n\f2.1 Mechanism format\n\nSince the buyer is running a learning algorithm, it is especially important to specify the manner of\ninteraction between the buyer and the seller. We consider two classes of mechanisms for the seller:\noption-based mechanisms, and critical mechanisms.\n\nIn a option-based mechanism, the seller offers the buyer K options (labeled 1 through K) each round.\nIf the buyer selects choice i at time t, the buyer receives the item with probability ai,t and pays a price\npi,t. A natural measure of complexity for such mechanisms is the number of options K presented to\nthe buyer, which we refer to as the option-complexity of the mechanism. Limiting this complexity is\nespecially important when interacting with learning agents, as they require some time to explore each\noption (indeed, as K approaches T , the low regret guarantees of the learning algorithms we consider\nbecome vacuous).\n\nCritical mechanisms [Braverman et al., 2018] are a subset of option-based mechanisms that are\nreasonable. In a critical mechanism, the buyer interacts with the mechanism each round by submitting\na bid b. The buyer then receives the item with probability at(b) and pays a price pt(b). These\nallocation/payment rules should satisfy the following properties:\n\n\u2022 Individual rationality: pt(b) satis\ufb01es pt(b) \u2264 b \u00b7 at(b), i.e. a bidder should never be charged\n\nmore than their bid in expectation.\n\n\u2022 Monotonicity: pt(b) and at(b) are weakly increasing in b, i.e., submitting a higher bid should\n\nnever decrease the winning probability or the payment.\n\n\u2022 Overbidding is dominated: If the bidder\u2019s value is v, it should never be in their interest to\n\nsubmit a bid b > v, i.e. if b > v then v \u00b7 at(v) \u2212 pt(v) > v \u00b7 at(b) \u2212 pt(b) for all t.\n\nIn both option-based mechanisms and critical mechanisms, we assume that the seller is completely\nnon-adaptive and sets the allocation / payment functions at the beginning of the protocol.\n\n2.2 No-regret learner\n\nIn contrast to a utility-maximizing buyer, we consider a buyer who follows some no-regret strategy\nfor the multi-armed bandit problem. In a classic multi-armed bandit problem with T rounds, the\nlearner (in our setting, the buyer) selects one of K options (\u2018arms\u2019) on round t and receives a reward\nri,t \u2208 [0, 1] if he selects option i. The rewards can be chosen adversarially and the learner\u2019s objective\nis to maximize his total reward.\n\nmaxiPT\n\nLet it be the arm pulled by the learner at round t. The regret for a (possibly randomized) strategy\nA is de\ufb01ned as the difference between performance of the strategy A and the best arm: Reg(A) =\nt=1 ri,t \u2212 rit,t. A strategy A for the multi-armed bandit problem is no-regret if the expected\nregret is sub-linear in T , i.e., E[Reg(A)] = o(T ). In addition to the bandits setting in which the\nlearner only learns the reward of the arm he pulls, our results also apply to the experts setting in\nwhich the learner can learn the rewards of all arms for every round. In our setting, the buyer learns\nai,t and pi,t, allowing him to compute the reward as ri,t = ai,t \u00b7 vt \u2212 pi,t. Moreover, the buyer has\nthe additional information of her value vt, and thus is in fact facing a contextual bandit problem.\n\nContextual Bandits\nIn a contextual bandit problem, the learner is additionally provided a context\nct from a \ufb01nite set C. The reward of pulling arm i under context c on round t is now given by ri,t(c).\nIn the experts setting, the learner can obtain the values of ri,t(ct) for all arms i under context ct after\nround t, while the learner only learns ri,t(ct) for the arm i he pulls in the bandits setting.\n\nconsidering the best context-speci\ufb01c policy \u03c0: Reg(M) = max\u03c0:C\u2192[K]PT\n\nThe notion of regret for a strategy M can be easily extended to the contextual bandit problem by\nt=1 r\u03c0(ct),t(ct) \u2212 rit,t(ct).\nAs before, a strategy M is no-regret if E[Reg(M)] = o(T ). When the size of the context C is a\nconstant with respect to T , a no-regret strategy M for the contextual bandits can be simply constructed\nfrom a no-regret strategy A for the classic bandit problem: maintain a separate instance of A for\nevery context c \u2208 C [Bubeck et al., 2012].\n\nAmong no-regret strategies, we are interested in a special class of mean-based strategies:\n\nDe\ufb01nition 2 (Mean-based Strategy). Let \u03c3i,t(c) = Pt\n\ns=1 ri,s(c) be the cumulative rewards for\npulling arm i under context c for the \ufb01rst t rounds. A strategy is \u03b3-mean-based if whenever \u03c3i,t(ct) <\n\n4\n\n\f\u03c3j,t(ct) \u2212 \u03b3T , the probability for the strategy to pull arm i on round t is at most \u03b3. A strategy is\nmean-based if it is \u03b3-mean-based with \u03b3 = o(1).\n\nIntuitively, mean-based strategies are strategies that will pick the arm that historically performs\nthe best. Braverman et al. [2018] shows that many no-regret algorithms are mean-based, including\ncommonly used variants of EXP3 (for the bandits setting), the Multiplicative Weights algorithm (for\nthe experts setting) and the Follow-the-Perturbed-Leader algorithm (for the experts setting).\n\n3 Option-based Mechanisms\n\nIn this section, we demonstrate a mechanism that can extract full welfare from a mean-based no-regret\nlearner even when the values are chosen adversarially.\n\n3.1 Warm-up: Extracting Full Welfare for V = {1, 2}\n\nConsider an additive approximation target \u03b5 > 0. It is without loss of generality to consider the case\nwith 2(1 \u2212 \u03b5) > 1: when 2(1 \u2212 \u03b5) \u2264 1, the seller can simply implement a scheme with only one\noption that always allocates the item and charges a payment 2(1 \u2212 \u03b5). We design a option-based\nmechanism with K = \u2308 log \u03b5\nlog(1\u2212\u03b5) \u2309 + 1 choices in addition to the null choice in which the buyer\nreceives and pays nothing for the entire time horizon. For the 0-th option, the buyer receives the item\nwith probability a0,t = 1 and pays a price p0,t = 2(1 \u2212 \u03b5) for all t. As for the remaining K \u2212 1\n(1\u2212\u03b5)i\u22121 T . We will divide the timeline of the i-th option with 1 \u2264 i \u2264 K into \ufb01ve\noptions, let \u03bai =\nsessions (see Table 1 for details).\n\n\u03b5\n\nFor convenience, let Si = (\u03bai, \u03bai+1]. Intuitively, the i-th option is active when t \u2208 Si, which spans\nLi = \u03bai+1 \u2212 \u03bai = \u03b52\n(1\u2212\u03b5)i T rounds. Among these Li rounds, the item is always allocated to the buyer\nwith probability 1 while the payment changes in a way such that: the payment for the \ufb01rst \u03b5Li rounds\nis 0, the payment for the last \u03b5Li rounds is 2, and the payment for the remaining rounds is 1.\n\nSession\n\nStart Time\n\nEnd Time\n\nAllocation Prob.\n\nPayment\n\n\u22051\n0\n\n1\n\n2\n\u22052\n\n0\n\u03bai\n\u03bai + \u03b53\n(1\u2212\u03b5)i T\n\u03bai+1 \u2212 \u03b53\n(1\u2212\u03b5)i T\n\u03bai+1\n\n1\n0\nTable 1: Construction of the i-th option\n\nT\n\n\u03bai\n\u03bai + \u03b53\n(1\u2212\u03b5)i T\n\u03bai+1 \u2212 \u03b53\n(1\u2212\u03b5)i T\n\u03bai+1\n\n0\n1\n\n1\n\n0\n0\n\n1\n\n2\n0\n\nAssume the buyer is running a \u03b3-mean-based algorithm. To analyze the revenue guarantee of our\n\nmechanism, we consider an arbitrary sequence of valuations (v1, \u00b7 \u00b7 \u00b7 , vT ) and Val = Pt vt. The high\n\nlevel idea behind this construction is that for the high valuations, i.e, vt = 2, the utility \u03c3i,t(2) keeps\nincreasing as t increases for the high option (i = 0) while for the low options (i > 0), it only increases\nwithin the active period Si. Therefore, with suf\ufb01ciently large t, we have \u03c30,t(2) > \u03c3i,t(2) for all\ni > 0 and therefore, the buyer with high valuation will play the high option with high probability. As\nfor vt = 1, the buyer plays the high option with probability at most \u03b3 since its payment is too high\nand we argue that the buyer will play the i-th option with high probability when t \u2208 Si.\n\nHigh valuation Assume that vt = 2. First notice that the cumulative utility for playing the 0-th\noption is \u03c30,t(2) = \u03b5t \u00b7 2. Suppose t \u2208 Si\u2217 for some i\u2217. For i < i\u2217, the active period of the i-th\noption with i < i\u2217 is already past and the cumulative utility for playing the i-th option is at most\n\n\u03c3i,t(2) \u2264 Li \u00b7 2 =\n\n\u03b52\n\n(1 \u2212 \u03b5)i T \u00b7 2 \u2264\n\n\u03b52\n\n(1 \u2212 \u03b5)i\u2217\u22121 T \u00b7 2 = \u03b5 \u00b7 \u03bai\u2217 \u00b7 2 = \u03c30,t(2) \u2212 \u03b5 \u00b7 (t \u2212 \u03bai\u2217 ) \u00b7 2\n\nAs for the i\u2217-th option, we have \u03c3i\u2217,t(2) \u2264 (t \u2212 \u03bai\u2217 ) \u00b7 2 = \u03c30,t(2) \u2212 (\u03bai\u2217 \u2212 (1 \u2212 \u03b5)t) \u00b7 2.\nMoreover, for any i-th option with i > i\u2217, we simply have \u03c3i,t(2) = 0. Therefore, the buyer\nwith valuation vt = 2 for t \u2208 Si\u2217 will play the 0-th option with probability at least 1 \u2212 K\u03b3\n\n5\n\n\f2\u03b5 \u00b7 T < t < \u03bai\u2217+1 \u2212 \u03b3\n\nwhen \u03b5t \u00b7 2 > \u03b3T , \u03b5 \u00b7 (t \u2212 \u03bai\u2217 ) \u00b7 2 > \u03b3T , and (\u03bai\u2217 \u2212 (1 \u2212 \u03b5)t) \u00b7 2 > \u03b3T , which implies that\n\u03bai\u2217 + \u03b3\n2(1\u2212\u03b5) \u00b7 T . Therefore, for each time period Si with 1 \u2264 i \u2264 K, there\nare at least Li \u2212(cid:16) \u03b3\n2(1\u2212\u03b5)(cid:17) T rounds where the buyer has probability at least 1 \u2212 K\u03b3 to play\n\nthe 0-th option, which contributes 2(1 \u2212 \u03b5) revenue per round. Therefore, the expected revenue loss\nfrom time period Si is at most\n\n2\u03b5 + \u03b3\n\n2\u03b5 \u00b7 Li +(cid:18) \u03b3\n\n2\u03b5\n\n+\n\n\u03b3\n\n2(1 \u2212 \u03b5)(cid:19) T \u00b7 2 + K\u03b3 \u00b7 Li \u00b7 2\n\nwhere 2\u03b5 \u00b7 Li is the revenue loss of charging 2(1 \u2212 \u03b5) and K\u03b3 \u00b7 Li \u00b7 2 is the expected revenue loss\nfrom playing an option other than the 0-th option. Thus, the total expected revenue loss from the\nrounds when vt = 2 is at most\n\n(\u03b5T ) \u00b7 2 +Xi\n\n(cid:20)2\u03b5 \u00b7 Li +(cid:18) \u03b3\n\n2\u03b5\n\n+\n\n\u03b3\n\n2(1 \u2212 \u03b5)(cid:19) T \u00b7 2 + K\u03b3 \u00b7 Li \u00b7 2(cid:21) = O(\u03b5T )\n\nwhere (\u03b5T ) \u00b7 2 is the revenue loss from the \ufb01rst \u03b5T rounds.\n\nLow valuation Assume that vt = 1. First notice that after the \ufb01rst \u03b5T rounds, the cumulative utility\nfor playing the 0-th option is \u03c30,t(1) = (1 \u2212 2(1 \u2212 \u03b5))t = \u2212\u2126(T ). Since there is a null arm that\nprovides cumulative utility 0, the buyer\u2019s probability of playing the 0-th option is at most \u03b3.\nSuppose t \u2208 Si\u2217 for some i\u2217. From our construction of the i-th option for any i 6= i\u2217, the buyer\u2019s\ncumulative utility of playing the i-th option is exactly 0: the buyer\u2019s utility gain is 0 in from session\n\u22051, \u22052, and 1, while her utility gain from session 0 is exactly cancelled out with his utility loss from\nsession 2, which leads to \u03c3i,t(1) = 0 for t > \u03bai+1 or t < \u03bai. As for the i\u2217-th option, we have\n\n\u03c3i\u2217,t(1) = \uf8f1\uf8f2\n\uf8f3\n\n\u03b53\n\nt \u2212 \u03bai\nfor t in session 0\n(1\u2212\u03b5)i\u2217 T for t in session 1\n\u03bai+1 \u2212 t\nfor t in session 2\n\nTherefore, once \u03bai + \u03b3T < t < \u03bai+1 \u2212 \u03b3T , the buyer with vt = 1 will play the i\u2217-th option with\nprobability 1 \u2212 K\u03b3. Therefore, the expected revenue loss within the time period Si is 2\u03b3T + K\u03b3 \u00b7 Li,\nwhere K\u03b3 \u00b7 Li is the expected revenue loss from playing an option other than the i\u2217-th option. Thus,\ni=1 K\u03b3 \u00b7 Li = O(\u03b5T ) where\n\nthe total revenue loss from the rounds with vt = 1 is at most \u03b5T +PK\n\n\u03b5T is the revenue loss from the \ufb01rst \u03b5T rounds.\n\n3.2 Extracting Full Welfare for V = {1, \u00b7 \u00b7 \u00b7 , H}\n\nthe buyer receives and pays nothing for the entire time horizon. For convenience, let Gi = Pi\n\nWe provide an option-based mechanism with K = H \u00b7\u2308 3H 2\n\u03b5 \u2309 options that achieves an additive revenue\nloss O(ln H\u00b7\u03b5T ) for V = {1, \u00b7 \u00b7 \u00b7 , H}. As usual, we assume that there is always a null choice in which\n1\n\u03c4 be\nH +(j\u22121)\u00b7 \u03b5T\n3H 2\n\u03b5 \u2309. Although \u03bai,j only depends on j, we still use the notation \u03bai,j for\n\nthe sum of the harmonic series up to i and \u03b1 = 1\u2212 1\nwhere i \u2208 V and 1 \u2264 j \u2264 \u2308 3H 2\nclarity. We will divide the timeline of the (i, j)-th option into \ufb01ve sessions (see Table 2).\n\n3H . Moreover, \u03bai,j = (GH +2\u03b1)\u00b7 \u03b5T\n\n\u03c4 =1\n\nSession\n\ninit\n\n0\n\nready\n\n1\n\u2205\n\nStart Time\n\n0\n\n\u03b1 \u00b7 \u03b5T\nH\n\n\u03bai,j\n\u03bai,j+1\n\n\u03bai,j \u2212 (Gi + \u03b1) \u00b7 \u03b5T\nH\n\nEnd Time\n\nAllocation Prob.\n\nPayment\n\n\u03bai,j \u2212 (Gi + \u03b1) \u00b7 \u03b5T\nH\n\n\u03b1 \u00b7 \u03b5T\nH\n\n\u03bai,j\n\u03bai,j+1\n\nT\n\n0\n0\n1\n1\n0\n\ni\n0\n0\ni\nH\n\nTable 2: Construction of the (i, j)-th option\n\nAssume the buyer is running a \u03b3-mean-based algorithm. To analyze the revenue guarantee of our\n\nmechanisms, we consider an arbitrary sequence of valuations (v1, \u00b7 \u00b7 \u00b7 , vT ) and Val = Pt vt.\n\nIntuitively, the (i, j)-th option starts with a init session in which it does not allocate the item but\ncharges a payment i, followed by a 0 session in which the option allocates and charges nothing.\n\n6\n\n\fTherefore, the buyer will not play the (i, j)-th option before its ready session. In the ready session,\nthe option allocates the item for free while in the 1 session, the option allocates the item with a\npayment i. Our construction ensures that if vt = i for t \u2208 (\u03bai,j, \u03bai,j+1], then the buyer will play the\noption (i, j) with high probability, which generates revenue i.\nLemma 3. If t \u2208 (\u03bai,j + \u03b3T, \u03bai,j+1 \u2212 \u03b3T ], then for any option (i\u2032, j\u2032) with i\u2032\n\u03c3(i,j),t(i) \u2212 \u03c3(i\u2032,j \u2032),t(i) > \u03b3T .\n\n6= i or j\u2032\n\n6= j,\n\nTherefore, for vt = i with t \u2208 (\u03bai,j + \u03b3T, \u03bai,j+1 \u2212 \u03b3T ], the buyer will play option (i, j) with\nprobability at least 1 \u2212 K\u03b3, which generates revenue i per round. Thus, the revenue loss is at most\n\nH \u00b7 (GH + 2\u03b1) \u00b7\n\n\u03b5T\nH\n\n+ H \u00b7 2\u03b3T \u00b7 K + K\u03b3 \u00b7 H \u00b7 T = O(ln H \u00b7 \u03b5T )\n\nH is the revenue loss for the \ufb01rst maxi \u03bai,1 = (GH + 2\u03b1) \u00b7 \u03b5T\n\nwhere H \u00b7 (GH + 2\u03b1) \u00b7 \u03b5T\nH rounds,\nH \u00b7 2\u03b3T \u00b7 K is the revenue loss for t \u2208 (\u03bai,j, \u03bai,j + \u03b3T ] or t \u2208 (\u03bai,j+1 \u2212 \u03b3T, \u03bai,j+1], and K\u03b3 \u00b7 H \u00b7 T\nis the revenue loss from playing an undesired option.\nTheorem 4. If the buyer with V = {1, 2, \u00b7 \u00b7 \u00b7 , H} is running a mean-based algorithm, for any\nconstant \u03b5 > 0, there exists a non-adaptive option-based mechanism with O( H 3 ln H\n) options for the\nseller which obtains revenue at least Val \u2212 O(\u03b5T ).\n\n\u03b5\n\n3.3 Extracting Full Welfare for V \u2286 [0, 1]\n\nLet \u03b5 be parameter for the target additive revenue loss O(\u03b5T ). For ease of presentation, we will\nrescale V to [0, H] such that H = 1/\u03b5, and thus, it suf\ufb01ces to show that we can obtain O(T ) loss in\nthe scaled version. First notice that it suf\ufb01ces to consider V \u2286 [1, H] since for all valuations less than\n1, we will suffer revenue loss at most 1 from each of them.\nLemma 5. Consider vt such that i < vt < i + 1 and t \u2208 (\u03bai,j, \u03bai,j+1]. Then, for any option (i\u2032, j\u2032)\nwith i\u2032 6\u2208 {i, i + 1} or j\u2032 > j, max{\u03c3(i,j),t(vt), \u03c3(i+1,j),t(vt)} \u2212 \u03c3(i\u2032,j \u2032),t(vt) > \u03b3T .\n\nTherefore, with probability at least 1 \u2212 K\u03b3, the buyer satisfying the requirement of Lemma 5 will\nplay either option (i, j\u2032) or option (i + 1, j\u2032) with j\u2032 \u2264 j. Recall that it is in fact that \u03bai,j = \u03bai+1,j\nfor all i. Therefore, if the buyer plays option (i + 1, j), it will generate revenue i + 1 since option\n(i + 1, j) is also in its 1 session. Moreover, if the buyer plays option (i, j\u2032) or (i + 1, j\u2032) with j\u2032 < j,\nthen the option is already in its \u2205 session and the buyer needs to pay H.\n\nThus, the revenue loss from vt is at most 1. Applying a similar argument as in Section 3.2, we can\nconclude that the expected revenue loss is O(T ). Rescale it back to V = [0, 1], we have\nTheorem 6. If the buyer with V \u2286 [0, 1] is running a mean-based algorithm, for any constant \u03b5 > 0,\nthere exists a non-adaptive option-based mechanism with O( ln 1/\u03b5\n) options for the seller which\nobtains revenue at least Val \u2212 O(\u03b5T ).\n\n\u03b53\n\nMeanwhile, we provide a lower-bound on the option-complexity, which implies the option-complexity\nof our algorithm is tight up to a polynomial factor in 1\n\u03b5 .\nTheorem 7. If the buyer with V \u2286 [0, 1] is running a mean-based algorithm, an option-based\nmechanism, which obtains expected revenue at least Val \u2212 O(\u03b5T ), must have \u2126( 1\n\n\u03b5 ) options.\n\nplays the i-th option. Suppose there are K options in total and let Pi(c) = PT\n\nProof. We \ufb01rst prove a lower bound for V = {1, 2, \u00b7 \u00b7 \u00b7 , H} and the theorem will be a simple corollary\nof this lower bound. Let Ii,t(c) be a binary variable indicating whether the buyer with value vt = c\nt=1 Pr[Ii,t(c) = 1] \u00b7 pi,t\nbe the expected total revenue obtained from the i-th option when the buyer\u2019s valuations are vt = c for\nall t. Since the expected total revenue is at least Val \u2212 O(\u03b5T ), when the buyer\u2019s valuations are vt = 1\nfor all t in which the total expected revenue is at least T \u2212 \u00b5\u03b5T for some constant \u00b5, there must\nexist an option i\u2217 such that Pi\u2217 (1) \u2265 (1\u2212\u00b5\u03b5)T\n. Moreover, let t\u2217 = sup{t | \u03c3i\u2217,t(1) \u2265 \u2212\u03b3T }. t\u2217 is\nwell-de\ufb01ned since \u03c3i\u2217,0(1) = 0. Notice that for all t > t\u2217, since the buyer is running a mean-based\nalgorithm, we have Pr[Ii\u2217,t(1)] \u2264 \u03b3 due to the presence of the null option. Therefore, we have\n\nK\n\nXt\u2264t\u2217\n\npi\u2217,t + Xt>t\u2217\n\n\u03b3 \u00b7 pi\u2217,t \u2265 Pi\u2217 (1) \u2265\n\n\u21d2 Xt\u2264t\u2217\n\npi\u2217,t \u2265\n\n(1 \u2212 \u00b5\u03b5)T\n\nK\n\n\u2212 \u03b3HT.\n\n(1 \u2212 \u00b5\u03b5)T\n\nK\n\n7\n\n\fwhere we use the fact that 0 \u2264 pi\u2217,t \u2264 H. Notice that the cumulative utility \u03c3i\u2217,t\u2217 (H) is\n\n\u03c3i\u2217,t\u2217 (H) = Xt\u2264t\u2217\n\nH \u00b7 ai\u2217,t \u2212 pi\u2217,t = H \u00b7 \u03c3i\u2217,t\u2217 (1) + (H \u2212 1) Xt\u2264t\u2217\n\npi\u2217,t\n\n\u2265\n\n(H \u2212 1)(1 \u2212 \u00b5\u03b5)T\n\nK\n\n\u2212 \u03b3H 2T\n\nConsider an environment when the buyer\u2019s valuations are vt = H for all t. Since the buyer is running\na no-regret algorithm, her cumulative utility for the \ufb01rst t\u2217 rounds is at least \u03c3i\u2217,t\u2217 (H) \u2212 o(T ). This\nis true because although the standard no-regret guarantee only applies to the \ufb01nal round T , the regret\nfor the \ufb01rst t rounds must also be o(T ), for any t < T . For the sake of contradiction, assume that\nthe regret for the \ufb01rst t rounds is \u2126(T ). Notice that the no-regret algorithm does not depend on the\nfuture. Therefore, consider an environment where the rewards for all options after round t are set to\nbe 0, which results in a \u2126(T ) regret for the \ufb01nal round T . A contradiction.\nIn addition, notice that the revenue loss from the \ufb01rst t\u2217 rounds is at least the buyer\u2019s cumulative\nutility, and thus, the revenue loss is at least \u03c3i\u2217,t\u2217 (H) \u2212 o(T ) = (H\u22121)T\nK \u2212 O(\u03b5T ). Finally, since\nthe total revenue loss for T rounds is at least the total revenue loss for the \ufb01rst t\u2217 rounds, in order to\nachieve O(\u03b5T ) revenue loss, we must have K = \u2126( H\n\n\u03b5 ).\n\nObserve that our proof only uses two sequences of valuations: a sequence with all 1 and a sequence\nof all H. Thus, our lower bound also applies to the stochastic settings with unknown distributions.\n\n4 Critical mechanisms\n\nIn this section we examine what the seller can accomplish when restricted to a critical mechanism.\n\nWith option-based mechanisms, we have shown in the previous section that it is possible to extract\narbitrarily close to the full welfare even when the buyer\u2019s values are chosen adversarially. In contrast\nto this, Braverman et al. [2018] show that with a critical mechanism, it is impossible to achieve even\na constant-factor approximation to the buyer\u2019s welfare, even when the buyer\u2019s values are drawn from\na distribution known to the seller.\nTheorem 8 (Corollary C.13 of [Braverman et al., 2018]). Let R(D) be the maximum possible revenue\na seller using a non-adaptive critical mechanism can achieve when the buyer\u2019s values are drawn\nindependently each round from distribution D. Then the ratio R(D)/Val(D) can grow arbitrarily\nsmall. If D is supported on an interval [1, H], then this ratio can be as small as O(log log H/ log H).\n\nIn Braverman et al. [2018], the authors also demonstrate how to construct a simple mechanism which\nachieves this maximum possible revenue (and hence this O(log log H/ log H) competitive ratio to\nthe welfare), but their construction requires detailed knowledge of the distribution D.\n\n4.1 Values from an unknown distribution\n\nWe show that it is possible achieve this same competitive ratio to the welfare in the prior-independent\nsetting, where the seller does not know the distribution D but only a range [1, H] it is supported on. In\nour mechanism, at each time t the seller speci\ufb01es a reserve price f (t), where f is a decreasing function\n\nC \u00b7 (1 \u2212 \u03b7 \u2212 t\n\nwith range [1, H] such that f (t) = max(cid:16)exp(cid:16) 1\nT )(cid:17), 1(cid:17), where \u03b7 = (1 + log H)\u2212\u03b5\nand C = 1\u2212\u03b7\n1+log H for \u03b5 \u2208 (0, 1). In each round, if the buyer bids above f (t) they receive the item\nand pay b; otherwise, they do not receive the item and pay nothing. More formally, the allocation\nand payment rules (at(b), pt(b)) are de\ufb01ned as follows: if b \u2265 f (t), then pt(b) = b, and at(b) = 1;\notherwise, pt(b) = at(b) = 0.\nTheorem 9. There is a non-adaptive critical mechanism for the seller which obtains expected revenue\nat least O(log log H/ log H)Val(D) from any buyer running a mean-based algorithm whose values\nare drawn independently each round from some distribution D supported on [1, H]. This mechanism\ndepends only on H and not on D.\n\nConsider the function x(v) : [1, H] \u2192 [0, T ] where x(v) = 1 \u2212 1\nT \u00b7 minf (t)\u2264v t. Note that x(v)\nequals the number of rounds where a bidder with value v has value higher than the reserve price f (t)\n\n8\n\n\f(in particular, x(v) is an increasing function of v). It can be shown (Braverman et al. [2018], Section\nC) that if the buyer is mean-based, the revenue obtained by the seller by using such an auction is\ngiven by R(D) = T \u00b7 Ev\u223cD [vx(v) \u2212 maxw(v \u2212 w)x(w)] \u2212 o(T ).\nLemma 10. If the seller is using a \ufb01rst-price auction with decreasing reserve price,\nR(D)/Val(D) is maximized when D is a singleton distribution.\n\nthen\n\nProof of Theorem 9. Note that for this choice of f , x(v) = \u03b7+C log v. By Lemma 10, R(D)/Val(D)\nis maximized when D is a singleton distribution. We therefore have that:\n\nR(D)\n\nVal(D)T\n\n\u2265 min\n\nv\n\nvx(v) \u2212 maxw(v \u2212 w)x(w)\n\n= min\n\nv,w