{"title": "On the Pseudo-Dimension of Nearly Optimal Auctions", "book": "Advances in Neural Information Processing Systems", "page_first": 136, "page_last": 144, "abstract": "This paper develops a general approach, rooted in statistical learning theory, to learning an approximately revenue-maximizing auction from data. We introduce t-level auctions to interpolate between simple auctions, such as welfare maximization with reserve prices, and optimal auctions, thereby balancing the competing demands of expressivity and simplicity. We prove that such auctions have small representation error, in the sense that for every product distribution F over bidders\u2019 valuations, there exists a t-level auction with small t and expected revenue close to optimal. We show that the set of t-level auctions has modest pseudo-dimension (for polynomial t) and therefore leads to small learning error. One consequence of our results is that, in arbitrary single-parameter settings, one can learn a mechanism with expected revenue arbitrarily close to optimal from a polynomial number of samples.", "full_text": "The Pseudo-Dimension of Near-Optimal Auctions\n\nJamie Morgenstern\u21e4\n\nComputer and Information Science\n\nUniversity of Pennsylvania\n\nPhiladelphia, PA\n\njamiemor@cis.upenn.edu\n\nTim Roughgarden\nStanford University\n\nPalo Alto, CA\n\ntim@cs.stanford.edu\n\nAbstract\n\nThis paper develops a general approach, rooted in statistical learning theory, to\nlearning an approximately revenue-maximizing auction from data. We introduce\nt-level auctions to interpolate between simple auctions, such as welfare maximiza-\ntion with reserve prices, and optimal auctions, thereby balancing the competing\ndemands of expressivity and simplicity. We prove that such auctions have small\nrepresentation error, in the sense that for every product distribution F over bid-\nders\u2019 valuations, there exists a t-level auction with small t and expected revenue\nclose to optimal. We show that the set of t-level auctions has modest pseudo-\ndimension (for polynomial t) and therefore leads to small learning error. One\nconsequence of our results is that, in arbitrary single-parameter settings, one can\nlearn a mechanism with expected revenue arbitrarily close to optimal from a poly-\nnomial number of samples.\n\n1\n\nIntroduction\n\nIn the traditional economic approach to identifying a revenue-maximizing auction, one \ufb01rst posits\na prior distribution over all unknown information, and then solves for the auction that maximizes\nexpected revenue with respect to this distribution. The \ufb01rst obstacle to making this approach oper-\national is the dif\ufb01culty of formulating an appropriate prior. The second obstacle is that, even if an\nappropriate prior distribution is available, the corresponding optimal auction can be far too complex\nand unintuitive for practical use. This motivates the goal of identifying auctions that are \u201csimple\u201d\nand yet nearly-optimal in terms of expected revenue.\nIn this paper, we apply tools from learning theory to address both of these challenges. In our model,\nwe assume that bidders\u2019 valuations (i.e., \u201cwillingness to pay\u201d) are drawn from an unknown distri-\nbution F . A learning algorithm is given i.i.d. samples from F . For example, these could represent\nthe outcomes of comparable transactions that were observed in the past. The learning algorithm\nsuggests an auction to use for future bidders, and its performance is measured by comparing the\nexpected revenue of its output auction to that earned by the optimal auction for the distribution F .\nThe possible outputs of the learning algorithm correspond to some set C of auctions. We view C as a\ndesign parameter that can be selected by a seller, along with the learning algorithm. A central goal of\nthis work is to identify classes C that balance representation error (the amount of revenue sacri\ufb01ced\nby restricting to auctions in C) with learning error (the generalization error incurred by learning over\nC from samples). That is, we seek a set C that is rich enough to contain an auction that closely\napproximates an optimal auction (whatever F might be), yet simple enough that the best auction\nin C can be learned from a small amount of data. Learning theory offers tools both for rigorously\nde\ufb01ning the \u201csimplicity\u201d of a set C of auctions, through well-known complexity measures such as the\n\u21e4Part of this work done while visiting Stanford University. Partially supported by a Simons Award for\n\nGraduate Students in Theoretical Computer Science, as well as NSF grant CCF-1415460.\n\n1\n\n\fpseudo-dimension, and for quantifying the amount of data necessary to identify the approximately\nbest auction from C. Our goal of learning a near-optimal auction also requires understanding the\nrepresentation error of different classes C; this task is problem-speci\ufb01c, and we develop the necessary\narguments in this paper.\n\n1.1 Our Contributions\n\n\u270f2 dC log H\n\nThe primary contributions of this paper are the following. First, we show that well-known concepts\nfrom statistical learning theory can be directly applied to reason about learning from data an approx-\nimately revenue-maximizing auction. Precisely, for a set C of auctions and an arbitrary unknown\ndistribution F over valuations in [1, H], O( H 2\n\u270f ) samples from F are enough to learn (up to\na 1 \u270f factor) the best auction in C, where dC denotes the pseudo-dimension of the set C (de\ufb01ned\nin Section 2). Second, we introduce the class of t-level auctions, to interpolate smoothly between\nsimple auctions, such as welfare maximization subject to individualized reserve prices (when t = 1),\nand the complex auctions that can arise as optimal auctions (as t ! 1). Third, we prove that in\nquite general auction settings with n bidders, the pseudo-dimension of the set of t-level auctions is\nO(nt log nt). Fourth, we quantify the number t of levels required for the set of t-level auctions to\nhave low representation error, with respect to the optimal auctions that arise from arbitrary prod-\nuct distributions F . For example, for single-item auctions and several generalizations thereof, if\n\u270f ), then for every product distribution F there exists a t-level auction with expected revenue\nt =\u2326( H\nat least 1 \u270f times that of the optimal auction for F .\nIn the above sense, the \u201ct\u201d in t-level auctions is a tunable \u201csweet spot\u201d, allowing a designer to bal-\nance the competing demands of expressivity (to achieve near-optimality) and simplicity (to achieve\nlearnability). For example, given a \ufb01xed amount of past data, our results indicate how much auction\ncomplexity (in the form of the number of levels t) one can employ without risking over\ufb01tting the\nauction to the data.\nAlternatively, given a target approximation factor 1 \u270f, our results give suf\ufb01cient conditions on t\nand consequently on the number of samples needed to achieve this approximation factor. The result-\ning sample complexity upper bound has polynomial dependence on H, \u270f1, and the number n of\nbidders. Known results [1, 8] imply that any method of learning a (1 \u270f)-approximate auction from\nsamples must have sample complexity with polynomial dependence on all three of these parameters,\neven for single-item auctions.\n\n1.2 Related Work\n\nThe present work shares much of its spirit and high-level goals with Balcan et al. [4], who proposed\napplying statistical learning theory to the design of near-optimal auctions. The \ufb01rst-order difference\nbetween the two works is that our work assumes bidders\u2019 valuations are drawn from an unknown\ndistribution, while Balcan et al. [4] study the more demanding \u201cprior-free\u201d setting. Since no auction\ncan achieve near-optimal revenue ex-post, Balcan et al. [4] de\ufb01ne their revenue benchmark with\nrespect to a set G of auctions on each input v as the maximum revenue obtained by any auction\nof G on v. The idea of learning from samples enters the work of Balcan et al. [4] through the\ninternal randomness of their partitioning of bidders, rather than through an exogenous distribution\nover inputs (as in this work). Both our work and theirs requires polynomial dependence on H, 1\n\u270f :\nours in terms of a necessary number of samples, and theirs in terms of a necessary number of bidders;\nas well as a measure of the complexity of the class G (in our case, the pseudo-dimension, and in\ntheirs, an analagous measure). The primary improvement of our work over of the results in Balcan\net al. [4] is that our results apply for single item-auctions, matroid feasibility, and arbitrary single-\nparameter settings (see Section 2 for de\ufb01nitions); while their results apply only to single-parameter\nsettings of unlimited supply.1 We also view as a feature the fact that our sample complexity upper\nbounds can be deduced directly from well-known results in learning theory \u2014 we can focus instead\non the non-trivial and problem-speci\ufb01c work of bounding the pseudo-dimension and representation\nerror of well-chosen auction classes.\nElkind [12] also considers a similar model to ours, but only for the special case of single-item auc-\ntions. While her proposed auction format is similar to ours, our results cover the far more general\n\n1See Balcan et al. [3] for an extension to the case of a large \ufb01nite supply.\n\n2\n\n\fcase of arbitrary single-parameter settings and and non-\ufb01nite support distributions; our sample com-\nplexity bounds are also better even in the case of a single-item auction (linear rather than quadratic\ndependence on the number of bidders). On the other hand, the learning algorithm in [12] (for single-\nitem auctions) is computationally ef\ufb01cient, while ours is not.\nCole and Roughgarden [8] study single-item auctions with n bidders with valuations drawn from\nindependent (not necessarily identical) \u201cregular\u201d distributions (see Section 2), and prove upper and\nlower bounds (polynomial in n and \u270f1) on the sample complexity of learning a (1\u270f)-approximate\nauction. While the formalism in their work is inspired by learning theory, no formal connections\nare offered; in particular, both their upper and lower bounds were proved from scratch. Our positive\nresults include single-item auctions as a very special case and, for bounded or MHR valuations, our\nsample complexity upper bounds are much better than those in Cole and Roughgarden [8].\nHuang et al. [15] consider learning the optimal price from samples when there is a single buyer\nand a single seller; this problem was also studied implicitly in [10]. Our general positive results\nobviously cover the bounded-valuation and MHR settings in [15], though the specialized analysis in\n[15] yields better (indeed, almost optimal) sample complexity bounds, as a function of \u270f1 and/or\nH.\nMedina and Mohri [17] show how to use a combination of the pseudo-dimension and Rademacher\ncomplexity to measure the sample complexity of selecting a single reserve price for the VCG mech-\nanism to optimize revenue. In our notation, this corresponds to analyzing a single set C of auctions\n(VCG with a reserve). Medina and Mohri [17] do not address the expressivity vs. simplicity trade-off\nthat is central to this paper.\nDughmi et al. [11] also study the sample complexity of learning good auctions, but their main results\nare negative (exponential sample complexity), for the dif\ufb01cult scenario of multi-parameter settings.\n(All settings in this paper are single-parameter.)\nOur work on t-level auctions also contributes to the literature on simple approximately revenue-\nmaximizing auctions (e.g., [6, 14, 7, 9, 21, 24, 2]). Here, one takes the perspective of a seller who\nknows the valuation distribution F but is bound by a \u201csimplicity constraint\u201d on the auction deployed,\nthereby ruling out the optimal auction. Our results that bound the representation error of t-level auc-\ntions (Theorems 3.4, 4.1, 5.4, and 6.2) can be interpreted as a principled way to trade off the simplic-\nity of an auction with its approximation guarantee. While previous work in this literature generally\nleft the term \u201csimple\u201d safely unde\ufb01ned, this paper effectively proposes the pseudo-dimension of an\nauction class as a rigorous and quanti\ufb01able simplicity measure.\n\n2 Preliminaries\n\nThis section reviews useful terminology and notation standard in Bayesian auction design and learn-\ning theory.\n\nBayesian Auction Design We consider single-parameter settings with n bidders. This means that\neach bidder has a single unknown parameter, its valuation or willingness to pay for \u201cwinning.\u201d (Ev-\nery bidder has value 0 for losing.) A setting is speci\ufb01ed by a collection X of subsets of {1, 2, . . . , n};\neach such subset represent a collection of bidders that can simultaneously \u201cwin.\u201d For example, in a\nsetting with k copies of an item, where no bidder wants more than one copy, X would be all subsets\nof {1, 2, . . . , n} of cardinality at most k.\nA generalization of this case, studied in the supplementary materials (Section 5), is matroid settings.\nThese satisfy: (i) whenever X 2X and Y \u2713 X, Y 2X ; and (ii) for two sets |I1| < |I2|, I1, I2 2X ,\nthere is always an augmenting element i2 2 I2 \\ I1 such that I1 [{ i2}2X , X . The supplementary\nmaterials (Section 6) also consider arbitrary single-parameter settings, where the only assumption is\nthat ; 2X . To ease comprehension, we often illustrate our main ideas using single-item auctions\n(where X is the singletons and the empty set).\nWe assume bidders\u2019 valuations are drawn from the continuous joint cumulative distribution F . Ex-\ncept in the extension in Section 4, we assume that the support of F is limited to [1, H]n. As\nin most of optimal auction theory [18], we usually assume that F is a product distribution, with\nF = F1 \u21e5 F2 \u21e5 . . . \u21e5 Fn and each vi \u21e0 Fi drawn independently but not identically. The virtual\n\n3\n\n\ffi(vi)\n\nvalue of bidder i is denoted by i(vi) = vi 1Fi(vi)\n. A distribution satis\ufb01es the monotone-hazard\nrate (MHR) condition if fi(vi)/(1 Fi(vi)) is nondecreasing; intuitively, if its tails are no heavier\nthan those of an exponential distribution. In a fundamental paper, [18] proved that when every vir-\ntual valuation function is nondecreasing (the \u201cregular\u201d case), the auction that maximizes expected\nrevenue for n Bayesian bidders chooses winners in a way which maximizes the sum of the virtual\nvalues of the winners. This auction is known as Myerson\u2019s auction, which we refer to as M. The\nresult can be extended to the general, \u201cnon-regular\u201d case by replacing the virtual valuation functions\nby \u201cironed virtual valuation functions.\u201d The details are well-understood but technical; see Myerson\n[18] and Hartline [13] for details.\n\nSample Complexity, VC Dimension, and the Pseudo-Dimension This section reviews several\nwell-known de\ufb01nitions from learning theory. Suppose there is some domain Q, and let c be some\nunknown target function c : Q!{ 0, 1}. Let D be an unknown distribution over Q. We wish to\nunderstand how many labeled samples (x, c(x)), x \u21e0D , are necessary and suf\ufb01cient to be able to\noutput a \u02c6c which agrees with c almost everywhere with respect to D. The distribution-independent\nsample complexity of learning c depends fundamentally on the \u201ccomplexity\u201d of the set of binary\nfunctions C from which we are choosing \u02c6c. We de\ufb01ne the relevant complexity measure next.\nLet S be a set of m samples from Q. The set S is said to be shattered by C if, for every subset\nT \u2713 S, there is some cT 2C such that cT (x) = 1 if x 2 T and cT (y) = 0 if y /2 T . That is, ranging\nover all c 2C induces all 2|S| possible projections onto S. The VC dimension of C, denoted VC(C),\nis the size of the largest set S that can be shattered by C.\nLet errS(\u02c6c) = (Px2S |c(x) \u02c6c(x)|)/|S| denote the empirical error of \u02c6c on S, and let err(\u02c6c) =\nEx\u21e0D[|c(x)\u02c6c(x)|] denote the true expected error of \u02c6c with respect to D. A key result from learning\ntheory [23] is: for every distribution D, a sample S of size \u2326(\u270f2(VC(C) + ln 1\n )) is suf\ufb01cient to\nguarantee that errS(\u02c6c) 2 [err(\u02c6c) \u270f, err(\u02c6c) + \u270f] for every \u02c6c 2C with probability 1 . In this\ncase, the error on the sample is close to the true error, simultaneously for every hypothesis in C. In\nparticular, choosing the hypothesis with the minimum sample error minimizes the true error, up to\n2\u270f. We say C is (\u270f, )-uniformly learnable with sample complexity m if, given a sample S of size\nm, with probability 1 , for all c 2C , |errS(c) err(c)| <\u270f : thus, any class C is (\u270f, )-uniformly\n samples. Conversely, for every learning algorithm A\n\u270f2VC(C) + ln 1\nlearnable with m =\u21e5 1\nthat uses fewer than VC(C)\nsamples, there exists a distribution D0 and a constant q such that, with\nprobability at least q, A outputs a hypothesis \u02c6c0 2C with err(\u02c6c0) > err(\u02c6c) + \u270f\n2 for some \u02c6c 2C . That\nis, the true error of the output hypothesis is more than \u270f\nTo learn real-valued functions, we need a generalization of VC dimension (which concerns binary\nfunctions). The pseudo-dimension [19] does exactly this.2 Formally, let c : Q! [0, H] be a real-\nvalued function over Q, and C the class we are learning over. Let S be a sample drawn from D, |S| =\nm, labeled according to c. Both the empirical and true error of a hypothesis \u02c6c are de\ufb01ned as before,\nthough |\u02c6c(x) c(x)| can now take on values in [0, H] rather than in {0, 1}. Let (r1, . . . , rm) 2\n[0, H]m be a set of targets for S. We say (r1, . . . , rm) witnesses the shattering of S by C if, for each\nT \u2713 S, there exists some cT 2C such that fT (xi) ri for all xi 2 T and cT (xi) < ri for all\nxi /2 T . If there exists some ~rwitnessing the shattering of S, we say S is shatterable by C. The\npseudo-dimension of C, denoted dC, is the size of the largest set S which is shatterable by C. The\nsample complexity upper bounds of this paper are derived from the following theorem, which states\nthat the distribution-independent sample complexity of learning over a class of real-valued functions\nC is governed by the class\u2019s pseudo-dimension.\nTheorem 2.1 [E.g. [1]] Suppose C is a class of real-valued functions with range in [0, H] and\npseudo-dimension dC. For every \u270f> 0, 2 [0, 1], the sample complexity of (\u270f, )-uniformly learning\n\n2 larger the best hypothesis in the class.\n\n\u270f\n\nMoreover, the guarantee in Theorem 2.1 is realized by the learning algorithm that simply outputs\nthe function c 2C with the smallest empirical error on the sample.\n\n2The fat-shattering dimension is a weaker condition that is also suf\ufb01cient for sample complexity bounds.\nAll of our arguments give the same upper bounds on the pseudo-dimension and the fat-shattering dimension of\nvarious auction classes, so we present the stronger statements.\n\nf with respect to C is m = O\u21e3 H\n\n\u270f2dC ln H\n\n\u270f + ln 1\n\n\u2318 .\n\n4\n\n\fApplying Pseudo-Dimension to Auction Classes For the remainder of this paper, we consider\nclasses of truthful auctions C.3 When we discuss some auction c 2C , we treat c : [0, H]n ! R\nas the function that maps (truthful) bid tuples to the revenue achieved on them by the auction c.\nThen, rather than minimizing error, we aim to maximize revenue. In our setting, the guarantee of\nTheorem 2.1 directly implies that, with probability at least 1 (over the m samples), the output of\nthe empirical revenue maximization learning algorithm \u2014 which returns the auction c 2C with the\nhighest average revenue on the samples \u2014 chooses an auction with expected revenue (over the true\nunderlying distribution F ) that is within an additive \u270f of the maximum possible.\n\n3 Single-Item Auctions\n\nTo illustrate out ideas, we \ufb01rst focus on single-item auctions. The results of this section are general-\nized signi\ufb01cantly in the supplementary (see Sections 5 and 6).\nSection 3.1 de\ufb01nes the class of t-level single-item auctions, gives an example, and interprets the auc-\ntions as approximations to virtual welfare maximizers. Section 3.2 proves that the pseudo-dimension\nof the set of such auctions is O(nt log nt), which by Theorem 2.1 implies a sample-complexity up-\nper bound. Section 3.3 proves that taking t =\u2326( H\n\n\u270f ) yields low representation error.\n\nt-Level Auctions: The Single-Item Case\n\n3.1\nWe now introduce t-level auctions, or Ct for short.\nIntuitively, one can think of each bidder as\nfacing one of t possible prices; the price they face depends upon the values of the other bidders.\nConsider, for each bidder i, t numbers 0 \uf8ff `i,0 \uf8ff `i,1 \uf8ff . . . \uf8ff `i,t1. We refer to these t numbers\nas thresholds. This set of tn numbers de\ufb01nes a t-level auction with the following allocation rule.\nConsider a valuation tuple v:\n\n1. For each bidder i, let ti(vi) denote the index \u2327 of the largest threshold `i,\u2327 that lower bounds\n\nvi (or -1 if vi <` i,0). We call ti(vi) the level of bidder i.\ngraphical tie-breaking ordering to pick the winner.4\nwhich case there is no sale).\n\n2. Sort the bidders from highest level to lowest level and, within a level, use a \ufb01xed lexico-\n3. Award the item to the \ufb01rst bidder in this sorted order (unless ti = 1 for every bidder i, in\nThe payment rule is the unique one that renders truthful bidding a dominant strategy and charges 0\nto losing bidders \u2014 the winning bidder pays the lowest bid at which she would continue to win. It is\nimportant for us to understand this payment rule in detail; there are three interesting cases. Suppose\nbidder i is the winner. In the \ufb01rst case, i is the only bidder who might be allocated the item (other\nbidders have level -1), in which case her bid must be at least her lowest threshold. In the second\ncase, there are multiple bidders at her level, so she must bid high enough to be at her level (and,\nsince ties are broken lexicographically, this is her threshold to win). In the \ufb01nal case, she need not\ncompete at her level: she can choose to either pay one level above her competition (in which case\nher position in the tie-breaking ordering does not matter) or she can bid at the same level as her\nhighest-level competitors (in which case she only wins if she dominates all of those bidders at the\nnext-highest level according to ). Formally, the payment p of the winner i (if any) is as follows.\nLet \u00af\u2327 denote the highest level \u2327 such that there at least two bidders at or above level \u2327, and I be the\nset of bidders other than i whose level is at least \u00af\u2327.\nMonop If \u00af\u2327 = 1, then pi = `i,0 (she is the only potential winner, but must have level 0 to win).\nMult If ti(vi) = \u00af\u2327 then pi = `i,\u00af\u2327 (she needs to be at level \u00af\u2327).\n\n3An auction is truthful if truthful bidding is a dominant strategy for every bidder. That is: for every bidder i,\nand all possible bids by the other bidders, i maximizes its expected utility (value minus price paid) by bidding\nits true value. In the single-parameter settings that we study, the expected revenue of the optimal non-truthful\nauction (measured at a Bayes-Nash equilibrium with respect to the prior distribution) is no larger than that of\nthe optimal truthful auction.\n\n4When the valuation distributions are regular, this tie-breaking can be done by value, or randomly; when\nit is done by value, this equates to a generalization of VCG with nonanonymous reserves (and is IC and has\nidentical representation error as this analysis when bidders are regular).\n\n5\n\n\fUnique If ti(vi) > \u00af\u2327, if i i0 for all i0 2 I, she pays pi = `i,\u00af\u2327 , otherwise she pays pi = `i,\u00af\u2327 +1\n(she either needs to be at level \u00af\u2327 + 1, in which case her position in does not matter, or at\nlevel \u00af\u2327, in which case she would need to be the highest according to ).\n\nWe now describle a particular t-level auction, and demonstrate each case of the payment rule.\n\nExample 3.1 Consider the following 4-level auction for bidders a, b, c. Let `a,\u00b7 = [2, 4, 6, 8], `b,\u00b7 =\n[1.5, 5, 9, 10], and `c,\u00b7 = [1.7, 3.9, 6, 7]. For example, if bidder a bids less than 2 she is at level 1,\na bid in [2, 4) puts her at level 0, a bid in [4, 6) at level 1, a bid in [6, 8) at level 2, and a bid of at\nleast 8 at level 3. Let a b c.\nMonop If va = 3, vb < 1.5, vc < 1.7, then b, c are at level 1 (to which the item is never allocated).\nMult If va 8, vb 10, vc < 7, then a and b are both at level 3, and a b, so a will win and\npays 8 (the minimum she needs to bid to be at level 3).\nUnique If va 8, vb 2 [5, 9], vc 2 [3.9, 6], then a is at level 3, and b and c are at level 1. Since\na b and a c, a need only pay 4 (enough to be at level 1). If, on the other hand,\nva 2 [4, 6], vb = [5, 9] and vc 6, c has level at least 2 (while a, b have level 1), but c\nneeds to pay 6 since a, b c.\n\nSo, a wins and pays 2, the minimum she needs to bid to be at level 0.\n\nRemark 3.2 (Connection to virtual valuation functions) t-level auctions are naturally inter-\npreted as discrete approximations to virtual welfare maximizers, and our representation error bound\nin Theorem 3.4 makes this precise. Each level corresponds to a constraint of the form \u201cIf any bidder\nhas level at least \u2327, do not sell to any bidder with level less than \u2327.\u201d We can interpret the `i,\u2327 \u2019s\n(with \ufb01xed \u2327, ranging over bidders i) as the bidder values that map to some common virtual value.\nFor example, 1-level auctions treat all values below the single threshold as having negative virtual\nvalue, and above the threshold uses values as proxies for virtual values. 2-level auctions use the\nsecond threshold to the re\ufb01ne virtual value estimates, and so on. With this interpretation, it is intu-\nitively clear that as t ! 1, it is possible to estimate bidders\u2019 virtual valuation functions and thus\napproximate Myerson\u2019s optimal auction to arbitrary accuracy.\n\n3.2 The Pseudo-Dimension of t-Level Auctions\n\nThis section shows that the pseudo-dimension of the class of t-level single-item auctions with n\nbidders is O(nt log nt). Combining this with Theorem 2.1 immediately yields sample complexity\nbounds (parameterized by t) for learning the best such auction from samples.\n\nTheorem 3.3 For a \ufb01xed tie-breaking order, the set of n-bidder single-item t-level auctions has\npseudo-dimension O (nt log(nt)).\n\nProof: Recall from Section 2 that we need to upper bound the size of every set that is shatterable\n\nusing t-level auctions. Fix a set of samples S = v1, . . . , vm of size m and a potential witness\nR = r1, . . . , rm. Each auction c induces a binary labeling of the samples vj of S (whether c\u2019s\n\nrevenue on vj is at least rj or strictly less than rj). The set S is shattered with witness R if and only\nif the number of distinct labelings of S given by any t-level auction is 2m.\nWe upper-bound the number of distinct labelings of S given by t-level auctions (for some \ufb01xed\npotential witness R), counting the labelings in two stages. Note that S involves nm numbers \u2014 one\nvalue vj\ni for each bidder for each sample, and a t-level auction involves nt numbers \u2014 t thresholds\n`i,\u2327 for each bidder. Call two t-level auctions with thresholds {`i,\u2327} and {\u02c6`i,\u2327} equivalent if\n\n1. The relative order of the `i,\u2327 \u2019s agrees with that of the \u02c6`i,\u2327 \u2019s, in that both induce the same\n\n2. merging the sorted list of the vj\n\npermutation of {1, 2, . . . , n}\u21e5{ 0, 1, . . . , t 1}.\nof the vj\n\ni \u2019s as does merging it with the sorted list of the \u02c6`i,\u2327 \u2019s.\n\ni \u2019s with the sorted list of the `i,\u2327 \u2019s yields the same partition\n\nNote that this is an equivalence relation. If two t-level auctions are equivalent, every comparison\nbetween a valuation and a threshold or two valuations is resolved identically by those auctions.\n\n6\n\n\fUsing the de\ufb01ning properties of equivalence, a crude upper bound on the number of equivalence\nclasses is\n\n(nt)! \u00b7\u2713nm + nt\n\nnt \u25c6 \uf8ff (nm + nt)2nt.\n\n(1)\n\nWe now upper-bound the number of distinct labelings of S that can be generated by t-level auctions\nin a single equivalence class C. First, as all comparisons between two numbers (valuations or\nthresholds) are resolved identically for all auctions in C, each bidder i in each sample vj of S\nis assigned the same level (across auctions in C), and the winner (if any) in each sample vj is\nconstant across all of C. By the same reasoning, the identity of the parameter that gives the winner\u2019s\npayment (some `i,\u2327 ) is uniquely determined by pairwise comparisons (recall Section 3.1) and hence\nis common across all auctions in C. The payments `i,\u2327 , however, can vary across auctions in the\nequivalence class.\nFor a bidder i and level \u2327 2{ 0, 1, 2, . . . , t1}, let Si,\u2327\u2713S be the subset of samples in which bidder i\nwins and pays `i,\u2327 . The revenue obtained by each auction in C on a sample of Si,\u2327 is simply `i,\u2327\n(and independent of all other parameters of the auction). Thus, ranging over all t-level auctions in\nC generates at most |Si,\u2327| distinct binary labelings of Si,\u2327 \u2014 the possible subsets of Si,\u2327 for which\nan auction meets the corresponding target rj form a nested collection.\nSummarizing, within the equivalence class C of t-level auctions, varying a parameter `i,\u2327 generates\nat most |Si,\u2327| different labelings of the samples Si,\u2327 and has no effect on the other samples. Since\nthe subsets {Si,\u2327}i,\u2327 are disjoint, varying all of the `i,\u2327 \u2019s (i.e., ranging over C) generates at most\n\nnYi=1\n\nt1Y\u2327 =0\n\n|Si,\u2327|\uf8ff mnt\n\n(2)\n\ndistinct labelings of S.\nCombining (1) and (2), the class of all t-level auctions produces at most (nm + nt)3nt distinct\nlabelings of S. Since shattering S requires 2m distinct labelings, we conclude that 2m \uf8ff (nm +\nnt)3nt, implying m = O(nt log nt) as claimed. \u2305\n\n3.3 The Representation Error of Single-Item t-Level Auctions\n\nIn this section, we show that for every bounded product distribution, there exists a t-level auction\nwith expected revenue close to that of the optimal single-item auction when bidders are independent\nand bounded. The analsysis \u201crounds\u201d an optimal auction to a t-level auction without losing much\nexpected revenue. This is done using thresholds to approximate each bidder\u2019s virtual value: the\nlowest threshold at the bidder\u2019s monopoly reserve price, the next 1\n\u270f thresholds at the values at which\nbidder i\u2019s virtual value surpasses multiples of \u270f, and the remaining thresholds at those values where\nbidder i\u2019s virtual value reaches powers of 1 + \u270f. Theorem 3.4 formalizes this intuition.\n\nTheorem 3.4 Suppose F is distribution over [1, H]n.\nsingle-item auction with expected revenue at least 1 \u270f times the optimal expected revenue.\nTheorem 3.4 follows immediately from the following lemma, with \u21b5 = = 1. We prove this more\ngeneral result for later use.\n\n\u270f + log1+\u270f H, Ct contains a\n\nIf t =\u2326 1\n\n\u270f + log1+\u270f\n\nLemma 3.5 Consider n bidders with valuations in [0, H] and with P[maxi vi >\u21b5 ] . Then,\nCt contains a single-item auction with expected revenue at least a 1 \u270f times that of an optimal\nauction, for t =\u21e5 \u21e3 1\nProof: Consider a \ufb01xed bidder i. We de\ufb01ne t thresholds for i, bucketing i by her virtual value,\nand prove that the t-level auction A using these thresholds for each bidder closely approximates the\nexpected revenue of the optimal auction M. Let \u270f0 be a parameter de\ufb01ned later.\n\nH\n\n\u21b5\u2318.\n\n7\n\n\fi\n\n(0), bidder i\u2019s monopoly reserve.5 For \u2327 2 [1,d 1\n\nSet `i,0 = 1\n(i 2 [0,\u21b5 ]). For \u2327 2 [d 1\nConsider a \ufb01xed valuation pro\ufb01le v. Let i\u21e4 denote the winner according to A, and i0 the winner\naccording to the optimal auction M. If there is no winner, we interpret i\u21e4(vi\u21e4) and i0 (vi0 ) as 0.\nRecall that M always awards the item to a bidder with the highest positive virtual value (or no one,\nif no such bidders exist). The de\ufb01nition of the thresholds immediately implies the following.\n\n\u270f0e], let `i,\u2327 = 1\n(\u21b5(1 + \u270f\n\n\u21b5 e], let `i,\u2327 = 1\n\n\u270f0e + dlog1+ \u270f\n\n\u270f0 e) (i >\u21b5 ).\n\n(\u2327 \u00b7 \u21b5\u270f0)\n\n\u270f0e,d 1\n\n2 )\u2327d 1\n\nH\n\ni\n\ni\n\n2\n\n1. A only allocates to non-negative ironed virtual-valued bidders.\n2. If there is no tie (that is, there is a unique bidder at the highest level), then i0 = i\u21e4.\n3. When there is a tie at level \u2327, the virtual value of the winner of A is close to that of M:\n\nIf \u2327 2 [0,d 1\nif \u2327 2 [d 1\n\n\u270f0e,d 1\n\n\u270f0e] then i0 (vi0 ) i\u21e4(vi\u21e4) \uf8ff \u21b5\u270f0;\n\n\u270f0e + dlog1+ \u270f\n\n2\n\nH\n\n\u21b5 e], i\u21e4 (vi\u21e4 )\n\n2.\ni0 ) 1 \u270f\n\ni0 (v\n\n\n\n2 )\u00b7 Ev[i0 (vi0 )] \u21b5\u270f0 = (1 \u270f\n\nThese facts imply that\n2 )\u00b7 Ev[Rev(M)] \u21b5\u270f0. (3)\nEv[Rev(A)] = Ev[i\u21e4(vi\u21e4)] (1 \u270f\nare equal. The \ufb01rst and \ufb01nal equality follow from A and M\u2019s allocations depending on ironed\nvirtual values, not on the values themselves, thus, the ironed virtual values are equal in expectation\nto the unironed virtual values, and thus the revenue of the mechanisms (see [13], Chapter 3.5 for\ndiscussion).\nAs P[maxi vi >\u21b5 ] , it must be that E[Rev(M)] \u21b5 (a posted price of \u21b5 will achieve this\nrevenue). Combining this with (3), and setting \u270f0 = \u270f\n2 implies Ev[Rev(A)] (1 \u270f) Ev[Rev(M)].\n\u2305\nCombining Theorems 2.1 and 3.4 yields the following Corollary 3.6.\n\nCorollary 3.6 Let F be a product distribution with all bidders\u2019 valuations in [1, H]. Assume that\n\n\u270f + log1+\u270f H and m = O\u21e3 H\n\n\u270f3 \u2318 . Then with\nt =\u21e5 1\nprobability at least 1 , the single-item empirical revenue maximizer of Ct on a set of m samples\nfrom F has expected revenue at least 1 \u270f times that of the optimal auction.\nOpen Questions\n\n\u270f2nt log (nt) log H\n\n\u2318 = \u02dcO\u21e3 H 2n\n\n\u270f + log 1\n\nThere are some signi\ufb01cant opportunities for follow-up research. First, there is much to do on the\ndesign of computationally ef\ufb01cient (in addition to sample-ef\ufb01cient) algorithms for learning a near-\noptimal auction. The present work focuses on sample complexity, and our learning algorithms are\ngenerally not computationally ef\ufb01cient.6 The general research agenda here is to identify auction\nclasses C for various settings such that:\n1. C has low representation error;\n2. C has small pseudo-dimension;\n3. There is a polynomial-time algorithm to \ufb01nd an approximately revenue-maximizing auction\n\nfrom C on a given set of samples.7\n\nThere are also interesting open questions on the statistical side, notably for multi-parameter prob-\nlems. While the negative result in [11] rules out a universally good upper bound on the sample\ncomplexity of learning a near-optimal mechanism in multi-parameter settings, we suspect that posi-\ntive results are possible for several interesting special cases.\n\n5Recall from Section 2 that i denotes the virtual valuation function of bidder i. (From here on, we always\nmean the ironed version of virtual values.) It is convenient to assume that these functions are strictly increasing\n(not just nondecreasing); this can be enforced at the cost of losing an arbitrarily small amount of revenue.\n\n6There is a clear parallel with computational learning theory [22]: while the information-theoretic foun-\ndations of classi\ufb01cation (VC dimension, etc. [23]) have been long understood, this research area strives to\nunderstand which low-dimensional concept classes are learnable in polynomial time.\n\n7The sample-complexity and performance bounds implied by pseudo-dimension analysis, as in Theo-\nrem 2.1, hold with such an approximation algorithm, with the algorithm\u2019s approximation factor carrying\nthrough to the learning algorithm\u2019s guarantee. See also [4, 11].\n\n8\n\n\fReferences\n[1] Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge\n\nUniversity Press, NY, NY, USA, 1999.\n\n[2] Moshe Babaioff, Nicole Immorlica, Brendan Lucier, and S. Matthew Weinberg. A simple and approxi-\n\nmately optimal mechanism for an additive buyer. SIGecom Exch., 13(2):31\u201335, January 2015.\n\n[3] Maria-Florina Balcan, Avrim Blum, and Yishay Mansour. Single price mechanisms for revenue maxi-\nmization in unlimited supply combinatorial auctions. Technical report, Carnegie Mellon University, 2007.\n[4] Maria-Florina Balcan, Avrim Blum, Jason D Hartline, and Yishay Mansour. Reducing mechanism design\nto algorithm design via machine learning. Jour. of Comp. and System Sciences, 74(8):1245\u20131270, 2008.\n[5] Yang Cai and Constantinos Daskalakis. Extreme-value theorems for optimal multidimensional pricing.\nIn Foundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 522\u2013531,\nPalm Springs, CA, USA., Oct 2011. IEEE.\n\n[6] Shuchi Chawla, Jason Hartline, and Robert Kleinberg. Algorithmic pricing via virtual valuations.\n\nIn\nProceedings of the 8th ACM Conf. on Electronic Commerce, pages 243\u2013251, NY, NY, USA, 2007. ACM.\n[7] Shuchi Chawla, Jason D. Hartline, David L. Malec, and Balasubramanian Sivan. Multi-parameter mech-\nIn Proceedings of the Forty-second ACM Symposium on\n\nanism design and sequential posted pricing.\nTheory of Computing, pages 311\u2013320, NY, NY, USA, 2010. ACM.\n\n[8] Richard Cole and Tim Roughgarden. The sample complexity of revenue maximization. In Proceedings of\nthe 46th Annual ACM Symposium on Theory of Computing, pages 243\u2013252, NY, NY, USA, 2014. SIAM.\n[9] Nikhil Devanur, Jason Hartline, Anna Karlin, and Thach Nguyen. Prior-independent multi-parameter\n\nmechanism design. In Internet and Network Economics, pages 122\u2013133. Springer, Singapore, 2011.\n\n[10] Peerapong Dhangwatnotai, Tim Roughgarden, and Qiqi Yan. Revenue maximization with a single sample.\nIn Proceedings of the 11th ACM Conf. on Electronic Commerce, pages 129\u2013138, NY, NY, USA, 2010.\nACM.\n\n[11] Shaddin Dughmi, Li Han, and Noam Nisan. Sampling and representation complexity of revenue max-\nimization. In Web and Internet Economics, volume 8877 of Lecture Notes in Computer Science, pages\n277\u2013291. Springer Intl. Publishing, Beijing, China, 2014.\n\n[12] Edith Elkind. Designing and learning optimal \ufb01nite support auctions. In Proceedings of the eighteenth\n\nannual ACM-SIAM symposium on Discrete algorithms, pages 736\u2013745. SIAM, 2007.\n\n[13] Jason Hartline. Mechanism design and approximation. Jason Hartline, Chicago, Illinois, 2015.\n[14] Jason D. Hartline and Tim Roughgarden. Simple versus optimal mechanisms. In ACM Conf. on Electronic\n\nCommerce, Stanford, CA, USA., 2009. ACM.\n\n[15] Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. Making the most of your samples. abs/1407.2479:\n\n1\u20133, 2014. URL http://arxiv.org/abs/1407.2479.\n\n[16] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT\n\nPress, Cambridge, MA, 1994.\n\n[17] Andres Munoz Medina and Mehryar Mohri. Learning theory and algorithms for revenue optimization in\nsecond price auctions with reserve. In Proceedings of The 31st Intl. Conf. on Machine Learning, pages\n262\u2013270, 2014.\n\n[18] Roger B Myerson. Optimal auction design. Mathematics of operations research, 6(1):58\u201373, 1981.\n[19] David Pollard. Convergence of stochastic processes. David Pollard, New Haven, Connecticut, 1984.\n[20] T. Roughgarden and O. Schrijvers. Ironing in the dark. Submitted, 2015.\n[21] Tim Roughgarden, Inbal Talgam-Cohen, and Qiqi Yan. Supply-limiting mechanisms. In Proceedings of\n\nthe 13th ACM Conf. on Electronic Commerce, pages 844\u2013861, NY, NY, USA, 2012. ACM.\n\n[22] Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134\u20131142, 1984.\n[23] Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events\n\nto their probabilities. Theory of Probability & Its Applications, 16(2):264\u2013280, 1971.\n\n[24] Andrew Chi-Chih Yao. An n-to-1 bidder reduction for multi-item auctions and its applications. In Pro-\nceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 92\u2013109, San\nDiego, CA, USA., 2015. ACM.\n\n9\n\n\f", "award": [], "sourceid": 76, "authors": [{"given_name": "Jamie", "family_name": "Morgenstern", "institution": "University of Pennsylvania"}, {"given_name": "Tim", "family_name": "Roughgarden", "institution": "Stanford University"}]}