{"title": "A Sample Complexity Measure with Applications to Learning Optimal Auctions", "book": "Advances in Neural Information Processing Systems", "page_first": 5352, "page_last": 5359, "abstract": "We introduce a new sample complexity measure, which we refer to as split-sample growth rate. For any hypothesis $H$ and for any sample $S$ of size $m$, the split-sample growth rate $\\hat{\\tau}_H(m)$ counts how many different hypotheses can empirical risk minimization output on any sub-sample of $S$ of size $m/2$. We show that the expected generalization error is upper bounded by $O\\left(\\sqrt{\\frac{\\log(\\hat{\\tau}_H(2m))}{m}}\\right)$. Our result is enabled by a strengthening of the Rademacher complexity analysis of the expected generalization error. We show that this sample complexity measure, greatly simplifies the analysis of the sample complexity of optimal auction design, for many auction classes studied in the literature. Their sample complexity can be derived solely by noticing that in these auction classes, ERM on any sample or sub-sample will pick parameters that are equal to one of the points in the sample.", "full_text": "A Sample Complexity Measure with Applications to\n\nLearning Optimal Auctions\n\nVasilis Syrgkanis\nMicrosoft Research\n\nvasy@microsoft.com\n\nAbstract\n\nWe introduce a new sample complexity measure, which we refer to as split-sample\ngrowth rate. For any hypothesis H and for any sample S of size m, the split-\nsample growth rate \u02c6\u03c4H (m) counts how many different hypotheses can empirical\nrisk minimization output on any sub-sample of S of size m/2. We show that\nthe expected generalization error is upper bounded by O\n. Our\nresult is enabled by a strengthening of the Rademacher complexity analysis of\nthe expected generalization error. We show that this sample complexity measure,\ngreatly simpli\ufb01es the analysis of the sample complexity of optimal auction design,\nfor many auction classes studied in the literature. Their sample complexity can\nbe derived solely by noticing that in these auction classes, ERM on any sample or\nsub-sample will pick parameters that are equal to one of the points in the sample.\n\n(cid:18)(cid:113) log(\u02c6\u03c4H (2m))\n\n(cid:19)\n\nm\n\n1\n\nIntroduction\n\nThe seminal work of [11] gave a recipe for designing the revenue maximizing auction in auction\nsettings where the private information of players is a single number and when the distribution over\nthis number is completely known to the auctioneer. The latter raises the question of how has the\nauction designer formed this prior distribution over the private information. Recent work, starting\nfrom [4], addresses the question of how to design optimal auctions when having access only to\nsamples of values from the bidders. We refer the reader to [5] for an overview of the existing results\nin the literature. [4, 9, 10, 2] give bounds on the sample complexity of optimal auctions without\ncomputational ef\ufb01ciency, while recent work has also focused on getting computationally ef\ufb01cient\nlearning bounds [5, 13, 6].\nThis work solely focuses on sample complexity and not computational ef\ufb01ciency and thus is more\nrelated to [4, 9, 10, 2]. The latter work, uses tools from supervised learning, such as pseudo-\ndimension [12] (a variant of VC dimension for real-valued functions), compression bounds [8] and\nRademacher complexity [12, 14] to bound the sample complexity of simple auction classes. Our\nwork introduces a new measure of sample complexity, which is a strengthening the Rademacher\ncomplexity analysis and hence could also be of independent interest outside the scope of the sample\ncomplexity of optimal auctions. Moreover, for the case of auctions, this measure greatly simpli\ufb01es\nthe analysis of their sample complexity in many cases.\nIn particular, we show that in general PAC learning settings, the expected generalization error is upper\nbounded by the Rademacher complexity not of the whole class of hypotheses, but rather only over\nthe class of hypotheses that could be the outcome of running Expected Risk Minimization (ERM)\non a subset of the samples of half the size. If the number of these hypotheses is small, then the\nlatter immediately yields a small generalization error. We refer to the growth rate of the latter set of\nhypotheses as the split-sample growth rate. This measure of complexity is not restricted to auction\ndesign and could be relevant to general statistical learning theory.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fWe then show that for many auction classes such as single-item auctions with player-speci\ufb01c reserves,\nsingle item t-level auctions and multiple-item item pricing auctions with additive buyers, the split-\nsample growth rate can be very easily bounded. The argument boils down to just saying that the\nEmpirical Risk Minimization over this classes will set the parameters of the auctions to be equal to\nsome value of some player in the sample. Then a simple counting argument gives bounds of the same\norder as in prior work in the literature that used the pseudo-dimension [9, 10]. In multi-item settings\nwe also get improvements on the sample complexity bound.\nSplit-sample growth rate is similar in spirit to the notion of local Rademacher complexity [3], which\nlooks at the Rademacher complexity on a subset of hypotheses with small empirical error.\nIn\nparticular, our proof is based on a re\ufb01nement of the classic analysis Rademacher complexity analysis\nof generalization error (see e.g. [14]). However, our bound is more structural, restricting the set\nto outcomes of the chosen ERM process on a sub-sample of half the size. Moreover, we note that\ncounting the number of possible outputs of ERM also has connections to a counting argument made\nin [1] in the context of pricing mechanisms. However, in essence the argument there is restricted to\ntransductive settings where the sample \u201cfeatures\u201d are known in advance and \ufb01xed and thereby the\nargument is much more straightforward and more similar to standard notions of \u201ceffective hypothesis\nspace\u201d used in VC-dimension arguments.\nOur new measure of sample complexity is applicable in the general statistical learning theory\nframework and hence could have applications beyond auctions. To convey a high level intuition of\nsettings where split-sample growth could simplify the sample complexity analysis, suppose that the\noutput hypothesis of ERM is uniquely de\ufb01ned by a constant number of sample points (e.g. consider\nlinear separators and assume that the loss is such that the output of ERM is uniquely characterized\nby choosing O(d) points from the sample). Then this means that the number of possible hypotheses\n\non any subset of size m/2, is at most O((cid:0)m\n(cid:1)) = O(md). Then the split sample growth rate analysis\nimmediately yields that the expected generalization error is O((cid:112)d \u00b7 log(m)/m), or equivalently the\n\nsample complexity of learning over this hypothesis class to within an \u0001 error is O(d \u00b7 log(1/\u0001)/\u00012).\n\nd\n\n2 Preliminaries\n\nWe look at the sample complexity of optimal auctions. We consider the case of m items, and n\nbidders. Each bidder has a value function vi drawn independently from a distribution Di and we\ndenote with D the joint distribution.\nWe assume we are given a sample set S = {v1, . . . , vm}, of m valuation vectors, where each vt \u223c D.\nLet H denote the class of all dominant strategy truthful single item auctions (i.e. auctions where no\nplayer has incentive to report anything else other than his true value to the auction, independent of\nwhat other players do). Moreover, let\n\nn(cid:88)\n\nr(h, v) =\n\nph\ni (v)\n\n(1)\n\nwhere ph\nvaluation vector v. Finally, let\n\ni (\u00b7) is the payment function of mechanism h, and r(h, v) is the revenue of mechanism h on\n(2)\n\nRD(h) = Ev\u223cD [r(h, v)]\n\ni=1\n\nES [RD(hS)] \u2265 sup\nh\u2208H\n\nRD(h) \u2212 \u0001(m)\n\nbe the expected revenue of mechanism h under the true distribution of values D.\nGiven a sample S of size m, we want to compute a dominant strategy truthful mechanism hS, such\nthat:\n\n(3)\nwhere \u0001(m) \u2192 0 as m \u2192 \u221e. We refer to \u0001(m) as the expected generalization error. Moreover, we\nde\ufb01ne the sample complexity of an auction class as:\nDe\ufb01nition 1 (Sample Complexity of Auction Class). The (additive error) sample complexity of an\nauction class H and a class of distributions D, for an accuracy target \u0001 is de\ufb01ned as the smallest\nnumber of samples m(\u0001), such that for any m \u2265 m(\u0001):\nES [RD(hS)] \u2265 sup\nh\u2208H\n\nRD(h) \u2212 \u0001\n\n(4)\n\n2\n\n\fWe might also be interested in a multiplcative error sample complexity, i.e.\n\nES [RD(hS)] \u2265 (1 \u2212 \u0001) sup\nh\u2208H\n\nRD(h)\n\n(5)\n\nThe latter is exactly the notion that is used in [4, 5]. If one assumes that the optimal revenue on the\ndistribution is lower bounded by some constant quantity, then an additive error implies a multiplicative\nerror. For instance, if one assumes that player values are bounded away from zero with signi\ufb01cant\nprobability, then that implies a lower bound on revenue. Such assumptions for instance, are made in\nthe work of [9]. We will focus on additive error in this work.\nWe will also be interested in proving high probability guarantees, i.e. with probability 1 \u2212 \u03b4:\n\nRD(hS) \u2265 sup\nh\u2208H\n\nRD(h) \u2212 \u0001(m, \u03b4)\n\n(6)\n\nwhere for any \u03b4, \u0001(m, \u03b4) \u2192 0 as m \u2192 \u221e.\n\n3 Generalization Error via the Split-Sample Growth Rate\n\nWe turn to the general PAC learning framework, and we give generalization guarantees in terms of a\nnew notion of complexity of a hypothesis space H, which we denote as split-sample growth rate.\nConsider an arbitrary hypothesis space H and an arbitrary data space Z, and suppose we are given\na set S of m samples {z1, . . . , zm}, where each zt is drawn i.i.d. from some distribution D on Z.\nWe are interested in maximizing some reward function r : H \u00d7 Z \u2192 [0, 1], in expectation over\ndistribution D. In particular, denote with RD(h) = Ez\u223cD [r(h, z)].\nWe will look at the Expected Reward Maximization algorithm on S, with some \ufb01xed tie-breaking\nrule. Speci\ufb01cally, if we let\n\nm(cid:88)\n\nt=1\n\nRS(h) =\n\n1\nm\n\nr(h, zt)\n\n(7)\n\n(8)\n\nthen ERM is de\ufb01ned as:\n\nhS = arg sup\nh\u2208H\n\nRS(h)\n\nwhere ties are broken based on some pre-de\ufb01ned manner.\nWe de\ufb01ne the notion of a split-sample hypothesis space:\nDe\ufb01nition 2 (Split-Sample Hypothesis Space). For any sample S, let \u02c6HS, denote the set of all\nhypothesis hT output by the ERM algorithm (with the pre-de\ufb01ned tie-breaking rule), on any subset\nT \u2282 S, of size (cid:100)|S|/2(cid:101), i.e.:\n\n\u02c6HS = {hT : T \u2282 S,|T| = (cid:100)|S|/2(cid:101)}\n\n(9)\n\nBased on the split-sample hypothesis space, we also de\ufb01ne the split-sample growth rate of a hypothesis\nspace H at value m, as the largest possible size of \u02c6HS for any set S of size m.\nDe\ufb01nition 3 (Split-Sample Growth Rate). The split-sample growth rate of a hypothesis H and an\nERM process for H, is de\ufb01ned as:\n\n\u02c6\u03c4H (m) = sup\n\nS:|S|=m\n\n| \u02c6HS|\n\n(10)\n\nWe \ufb01rst show that the generalization error is upper bounded by the Rademacher complexity evaluated\non the split-sample hypothesis space of the union of two samples of size m. The Rademacher\ncomplexity R(S, H) of a sample S of size m and a hypothesis space H is de\ufb01ned as:\n\nR(S, H) = E\u03c3\n\n2\nm\n\nsup\nh\u2208H\n\n\u03c3t \u00b7 r(h, zt)\n\n(11)\n\nwhere \u03c3 = (\u03c31, . . . , \u03c3m) and each \u03c3t is an independent binary random variable taking values {\u22121, 1},\neach with equal probability.\n\n3\n\n(cid:34)\n\n(cid:88)\n\nzt\u2208S\n\n(cid:35)\n\n\fLemma 1. For any hypothesis space H, and any \ufb01xed ERM process, we have:\n\nRD(h) \u2212 ES,S(cid:48)\nwhere S and S(cid:48) are two independent samples of some size m.\n\nES [RD(hS)] \u2265 sup\nh\u2208H\n\n(cid:104)R(S, \u02c6HS\u222aS(cid:48))\n\n(cid:105)\n\n,\n\n(12)\n\nProof. Let h\u2217 be the optimal hypothesis for distribution D. First we re-write the left hand side, by\nadding and subtracting the expected empirical reward:\n\nES [RD(hS)] = ES [RS(hS)] \u2212 ES [RS(hS) \u2212 RD(hS)]\n\u2265 ES [RS(h\u2217)] \u2212 ES [RS(hS) \u2212 RD(hS)]\n= RD(h\u2217) \u2212 ES [RS(hS) \u2212 RD(hS)]\n\n(hS maximizes empirical reward)\n(h\u2217 is independent of S)\n\nThus it suf\ufb01ces to upper bound the second quantity in the above equation.\nSince RD(h) = ES(cid:48) [RS(cid:48)(h)] for a fresh sample S(cid:48) of size m, we have:\n\nES [RS(hS) \u2212 RD(hS)] = ES [RS(hS) \u2212 ES(cid:48) [RS(cid:48)(hS)]]\n\n= ES,S(cid:48) [RS(hS) \u2212 RS(cid:48)(hS)]\n\nNow, consider the set \u02c6HS\u222aS(cid:48). Since S is a subset of S \u222a S(cid:48) of size |S \u222a S(cid:48)|/2, we have by the\nde\ufb01nition of the split-sample hypothesis space that hS \u2208 \u02c6HS\u222aS(cid:48). Thus we can upper bound the latter\nquantity by taking a supremum over h \u2208 \u02c6HS\u222aS(cid:48):\n\n(cid:35)\n\nES [RS(hS) \u2212 RD(hS)] \u2264 ES,S(cid:48)\n\nRS(h) \u2212 RS(cid:48)(h)\n\nsup\n\nh\u2208 \u02c6HS\u222aS(cid:48)\n\nm(cid:88)\n\nsup\n\n= ES,S(cid:48)\n\n1\nm\nh\u2208 \u02c6HS\u222aS(cid:48)\nt \u2208 S(cid:48) to zt. By doing show\nNow observe, that we can rename any sample zt \u2208 S to z(cid:48)\nt and sample z(cid:48)\nwe do not change the distribution. Moreover, we do not change the quantity HS\u222aS(cid:48), since S \u222a S(cid:48) is\ninvariant to such swaps. Finally, we only change the sign of the quantity (r(h, zt) \u2212 r(h, z(cid:48)\n(cid:35)\nt)). Thus\nm(cid:88)\nif we denote with \u03c3t \u2208 {\u22121, 1}, a Rademacher variable, we get the above quantity is equal to:\n\u03c3t (r(h, zt) \u2212 r(h, z(cid:48)\nES,S(cid:48)\nt))\n\n(r(h, zt) \u2212 r(h, z(cid:48)\nt))\n\n(r(h, zt) \u2212 r(h, z(cid:48)\nt))\n\nm(cid:88)\n\n= ES,S(cid:48)\n\n(cid:34)\n\n(cid:35)\n\n(cid:34)\n\nsup\n\nt=1\n\n1\nm\n\nh\u2208 \u02c6HS\u222aS(cid:48)\n\nt=1\n\nsup\n\nh\u2208 \u02c6HS\u222aS(cid:48)\n\n1\nm\n\nt=1\n\n(13)\nfor any vector \u03c3 = (\u03c31, . . . , \u03c3m) \u2208 {\u22121, 1}m. The latter also holds in expectation over \u03c3, where \u03c3t\nis randomly drawn between {\u22121, 1} with equal probability. Hence:\n\nES [RS(hS) \u2212 RD(hS)] \u2264 ES,S(cid:48),\u03c3\n\n\u03c3t (r(h, zt) \u2212 r(h, z(cid:48)\nt))\n\nBy splitting the supremma into a positive and negative part and observing that the two expected\nquantities are identical, we get:\n\nES [RS(hS) \u2212 RD(hS)] \u2264 2ES,S(cid:48),\u03c3\n\n\u03c3tr(h, zt)\n\nm(cid:88)\n\nt=1\n\nsup\n\nh\u2208 \u02c6HS\u222aS(cid:48)\n\n1\nm\n\n(cid:34)\n(cid:104)R(S, \u02c6HS\u222aS(cid:48))\n\nh\u2208 \u02c6HS\u222aS(cid:48)\n\nsup\n\n1\nm\n\n(cid:105)\n\nm(cid:88)\n\nt=1\n\n= ES,S(cid:48)\n\n(cid:34)\n(cid:34)\n\n(cid:34)\n\n(cid:35)\n\n(cid:35)\n\n(cid:35)\n\nwhere R(S, H) denotes the Rademacher complexity of a sample S and hypothesis H.\n\nObserve, that the latter theorem is a strengthening of the fact that the Rademacher complexity upper\nbounds the generalization error, simply because:\n\n(cid:104)R(S, \u02c6HS\u222aS(cid:48))\n\n(cid:105) \u2264 ES,S(cid:48) [R(S, H)] = ES [R(S, H)]\n\nES,S(cid:48)\n\n(14)\n\nThus if we can bound the Rademacher complexity of H, then the latter lemma gives a bound on the\ngeneralization error. However, the reverse might not be true. Finally, we show our main theorem,\nwhich shows that if the split-sample hypothesis space has small size, then we immediately get a\ngeneralization bound, without the need to further analyze the Rademacher complexity of H.\n\n4\n\n\fTheorem 2 (Main Theorem). For any hypothesis space H, and any \ufb01xed ERM process, we have:\n\nES [RD(hS)] \u2265 sup\nh\u2208H\n\nMoreover, with probability 1 \u2212 \u03b4:\n\nRD(h) \u2212\n\n2 log(\u02c6\u03c4H (2m))\n\nm\n\nRD(hS) \u2265 sup\nh\u2208H\n\nRD(h) \u2212 1\n\u03b4\n\n2 log(\u02c6\u03c4H (2m))\n\nm\n\nProof. By applying Massart\u2019s lemma (see e.g. [14]) we have that:\n\nR(S, \u02c6HS\u222aS(cid:48)) \u2264\n\n2 log(| \u02c6HS\u222aS(cid:48)|)\n\n\u2264\n\n2 log(\u02c6\u03c4H (2m))\n\nm\nCombining the above with Lemma 1, yields the \ufb01rst part of the theorem.\nFinally,\nthe random variable\nsuph\u2208H RD(h) \u2212 RD(hS) is non-negative and by applying Markov\u2019s inequality: with probabil-\nity 1 \u2212 \u03b4\n\nthe high probability statement follows from observing that\n\nm\n\nRD(h) \u2212 RD(hS) \u2264 1\n\u03b4\n\nES\n\nsup\nh\u2208H\n\nsup\nh\u2208H\n\nRD(h) \u2212 RD(hS)\n\n\u2264 1\n\u03b4\n\n2 log(\u02c6\u03c4H (2m))\n\nm\n\n(18)\n\n(cid:21)\n\n(cid:114)\n\n(cid:115)\n\n(cid:20)\n\n(15)\n\n(16)\n\n(17)\n\n(cid:114)\n(cid:114)\n\n(cid:114)\n\n(cid:114)\n\n\uf8ee\uf8f0\n\nThe latter theorem can be trivially extended to the case when r : H \u00d7 Z \u2192 [\u03b1, \u03b2], leading to a bound\nof the form:\n\nES [RD(hS)] \u2265 sup\nh\u2208H\n\nRD(h) \u2212 (\u03b2 \u2212 \u03b1)\n\n2 log(\u02c6\u03c4H (2m))\n\nm\n\n(19)\n\nWe note that unlike the standard Rademacher complexity, which is de\ufb01ned as R(S, H), our bound,\nwhich is based on bounding R(S, \u02c6HS\u222aS(cid:48)) for any two datasets S, S(cid:48) of equal size, does not imply a\nhigh probability bound via McDiarmid\u2019s inequality (see e.g. Chapter 26 of [14] of how this is done\nfor Rademacher complexity analysis), but only via Markov\u2019s inequality. The latter yields a worse\ndependence on the con\ufb01dence \u03b4 on the high probability bound of 1/\u03b4, rather than log(1/\u03b4). The\nreason for the latter is that the quantity R(S, \u02c6HS\u222aS(cid:48)), depends on the sample S, not only in terms\nof on which points to evaluate the hypothesis, but also on determining the hypothesis space \u02c6HS\u222aS(cid:48).\nHence, the function:\n\nf (z1, . . . , zm) = ES(cid:48)\n\n1\nm\ndoes not satisfy the stability property that |f (z) \u2212 f (z(cid:48)(cid:48)\nm. The reason being that the\nsupremum is taken over a different hypothesis space in the two inputs. This is unlike the case of the\nfunction:\n\n\u03c3t (r(h, zt) \u2212 r(h, z(cid:48)\nt))\n\ni , z\u2212i)| \u2264 1\n\nh\u2208 \u02c6H{z1 ,...,zm}\u222aS(cid:48)\n\n(20)\n\nsup\n\n(cid:35)\n\n(cid:34)\n\nf (z1, . . . , zm) = ES(cid:48)\n\n1\nm\n\nsup\nh\u2208H\n\n\u03c3t (r(h, zt) \u2212 r(h, z(cid:48)\nt))\n\n(21)\n\nm(cid:88)\n\nt=1\n\nwhich is used in the standard Rademacher complexity bound analysis, which satis\ufb01es the latter\nstability property. Resolving whether this worse dependence on \u03b4 is necessary is an interesting open\nquestion.\n\n4 Sample Complexity of Auctions via Split-Sample Growth\n\nWe now present the application of the latter measure of complexity to the analysis of the sample\ncomplexity of revenue optimal auctions. Thoughout this section we assume that the revenue of\nany auction lies in the range [0, 1]. The results can be easily adapted to any other range [\u03b1, \u03b2], by\n\n5\n\nm(cid:88)\n\nt=1\n\n\uf8f9\uf8fb\n\n\fre-scaling the equations, which will lead to blow-ups in the sample complexity of the order of an\nextra (\u03b2 \u2212 \u03b1) multiplicative factor. This limits the results here to bounded distributions of values.\nHowever, as was shown in [5], one can always cap the distribution of values up to some upper bound,\nfor the case of regular distributions, by losing only an \u0001 fraction of the revenue. So one can apply the\nresults below on this capped distribution.\n\nSingle bidder and single item. Consider the case of a single bidder and single item auction. In this\nsetting, it is known by results in auction theory [11] that an optimal auction belongs to the hypothesis\nclass H = {post a reserve price r for r \u2208 [0, 1]}. We consider, the ERM rule, which for any set S,\nin the case of ties, it favors reserve prices that are equal to some valuation vt \u2208 S. Wlog assume that\nsamples v1, . . . , vm are ordered in increasing order. Observe, that for any set S, this ERM rule on any\nsubset T of S, will post a reserve price that is equal to some value vt \u2208 T . Any other reserve price\nin between two values [vt, vt+1] is weakly dominated by posting r = vt+1, as it does not change\nwhich samples are allocated and we can only increase revenue. Thus the space \u02c6HS is a subset of\n{post a reserve price r \u2208 {v1, . . . , vm}. The latter is of size m. Thus the split-sample growth of H\nis \u02c6\u03c4H (m) \u2264 m. This yields:\n\nES [RD(hS)] \u2265 sup\nh\u2208H\n\n(cid:114)\n(cid:16) log(1/\u0001)\n(cid:17)\n\nRD(h) \u2212\n\n\u00012\n\n2 log(2m)\n\nm\n\n.\n\nEquivalently, the sample complexity is mH (\u0001) = O\n\nMultiple i.i.d. regular bidders and single item.\nIn this case, it is known by results in auction\ntheory [11] that the optimal auction belongs to the space of hypotheses H consisting of second price\nauctions with some reserve r \u2208 [0, 1]. Again if we consider ERM which in case of ties favors a\nreserve that equals to a value in the sample (assuming that is part of the tied set, or outputs any other\nvalue otherwise), then observe that for any subset T of a sample S, ERM on that subset will pick a\nreserve price that is equal to one of the values in the samples S. Thus \u02c6\u03c4H (m) \u2264 n \u00b7 m. This yields:\n\n(22)\n\n(23)\n\nES [RD(hS)] \u2265 sup\nh\u2208H\n\nRD(h) \u2212\n\nEquivalently, the sample complexity is mH (\u0001) = O\n\n2 log(2 \u00b7 n \u00b7 m)\n\n(cid:114)\n(cid:16) log(n/\u00012)\n\n\u00012\n\n(cid:17)\n\nm\n\n.\n\nNon-i.i.d. regular bidders, single item, second price with player speci\ufb01c reserves.\nIn this case,\nit is known by results in auction theory [11] that the optimal auction belongs to the space of hypotheses\nHSP consisting of second price auctions with some reserve ri \u2208 [0, 1] for each player i. Again if we\nconsider ERM which in case of ties favors a reserve that equals to a value in the sample (assuming\nthat is part of the tied set, or outputs any other value otherwise), then observe that for any subset T\nof a sample S, ERM on that subset will pick a reserve price ri that is equal to one of the values vi\nt\nof player i in the sample S. There are m such possible choices for each player, thus mn possible\nchoices of reserves in total. Thus \u02c6\u03c4H (m) \u2264 mn. This yields:\n\nES [RD(hS)] \u2265 sup\nh\u2208HSP\n\nRD(h) \u2212\n\n2n log(2m)\n\nm\n\n(24)\n\nIf H is the space of all dominant strategy truthful mechanisms, then by prophet inequalities (see [7]),\nwe know that suph\u2208HSP RD(h) \u2265 1\n\n2 suph\u2208H RD(h). Thus:\nRD(h) \u2212\n\nES [RD(hS)] \u2265 1\n2\n\nsup\nh\u2208H\n\n2n log(2m)\n\nm\n\n(25)\n\nirregular bidders single item.\n\nNon-i.i.d.\nIn this case it is known by results in auction theory\n[11] that the optimal auction belongs to the space of hypotheses H consisting of all virtual welfare\nmaximizing auctions: For each player i, pick a monotone function \u02c6\u03c6i(vi) \u2208 [\u22121, 1] and allocate to\nthe player with the highest non-negative virtual value, charging him the lowest value he could have\nbid and still win the item. In this case, we will \ufb01rst coarsen the space of all possible auctions.\n\n6\n\n(cid:114)\n\n(cid:114)\n\n\fs \u2264 \u03b8i\n\n0 \u2264 \u03b8i\n\n1 \u2264 . . . \u2264 \u03b8i\n\nIn particular, we will consider the class of t-level auctions of [9]. In this class, we constrain the value\nfunctions \u02c6\u03c6i(vi) to only take values in the discrete \u0001 grid in [0, 1]. We will call this class H\u0001. An\nequivalent representation of these auctions is by saying that for each player i, we de\ufb01ne a vector of\nthresholds 0 = \u03b8i\ns+1 = 1, with s = 1/\u0001. The index of a player is the largest\nj for which vi \u2265 \u03b8j. Then we allocate the item to the player with the highest index (breaking ties\nlexicographically) and charge the minimum value he has to bid to continue to win.\nObserve that on any sample S of valuation vectors, it is always weakly better to place the thresholds\nj on one of the values in the set S. Any other threshold is weakly dominated, as it does not change\n\u03b8i\nthe allocation. Thus for any subset T of a set S of size m, we have that the thresholds of each player\ni will take one of the values of player i that appears in set S. We have 1/\u0001 thresholds for each player,\nhence m1/\u0001 combinations of thresholds for each player and mn/\u0001 combinations of thresholds for all\nplayers. Thus \u02c6\u03c4H (m) \u2264 mn/\u0001. This yields:\n\n(cid:114)\n\nES [RD(hS)] \u2265 sup\nh\u2208H\u0001\n\nRD(h) \u2212\n\n2n log(2m)\n\n\u0001 \u00b7 m\n\n(26)\n\n(27)\n\n(28)\n\nMoreover, by [9] we also have that:\n\n(cid:16) 2n log(2m)\n\n(cid:17)1/3\n\nm\n\nsup\nh\u2208H\u0001\n\n, we get:\n\nPicking, \u0001 =\n\nRD(h) \u2265 sup\nh\u2208H\n\nRD(h) \u2212 \u0001\n\nES [RD(hS)] \u2265 sup\nh\u2208H\n\nRD(h) \u2212 2\n\nEquivalently, the sample complexity is mH (\u0001) = O\n\n(cid:19)1/3\n\n(cid:18) 2n log(2m)\n(cid:16) n log(1/\u0001)\n\n(cid:17)\n\nm\n\n.\n\n\u00013\n\nk items, n bidders, additive valuations, grand bundle pricing.\nIf the reserve price was anony-\nmous, then the reserve price output by ERM on any subset of a sample S of size m, will take the\nvalue of one of the m total values for the items of the buyers in S. So \u02c6\u03c4H (m) = m \u00b7 n. If the reserve\nprice was not anonymous, then for each buyer ERM will pick one of the m total item values, so\n\u02c6\u03c4H (m) \u2264 mn. Thus the sample complexity is mH (\u0001) = O\n\n(cid:16) n log(1/\u0001)\n\n(cid:17)\n\n.\n\n\u00012\n\nk items, n bidders, additive valuations, item prices.\nIf reserve prices are anonymous, then each\nreserve price on item j computed by ERM on any subset of a sample S of size m, will take the value\nof one of the player\u2019s values for item j, i.e. n \u00b7 m. So \u02c6\u03c4H (m) = (n \u00b7 m)k. If reserve prices are not\nanonymous, then the reserve price on item j for player i will take the value of one of the player\u2019s\nvalues for the item. So \u02c6\u03c4H (m) \u2264 mn\u00b7k. Thus the sample complexity is mH (\u0001) = O\n.\n\n(cid:16) nk log(1/\u0001)\n\n(cid:17)\n\n\u00012\n\nk items, n bidders, additive valuations, best of grand bundle pricing and item pricing. ERM\non the combination will take values on any subset of a sample S of size m, that is at most the\nproduct of the values of each of the classes (bundle or item pricing). Thus, for anonymous pricing:\n\u02c6\u03c4H (m) = (m \u00b7 n)k+1 and for non-anonymous pricing: \u02c6\u03c4H (m) \u2264 mn(k+1). Thus the sample\ncomplexity is mH (\u0001) = O\n\n(cid:16) n(k+1) log(1/\u0001)\n\n(cid:17)\n\n.\n\n\u00012\n\nIn the case of a single bidder, we know that the best of bundle pricing or item pricing is a 1/8\napproximation to the overall best truthful mechanism for the true distribution of values, assuming\nvalues for each item are drawn independently. Thus in the latter case we have:\n\nES [RD(hS)] \u2265 1\n6\n\nsup\nh\u2208H\n\nRD(h) \u2212\n\nwhere H is the class of all truthful mechanisms.\n\n2(k + 1) log(2m)\n\nm\n\n(29)\n\n(cid:114)\n\nComparison with [10]. The latter three applications were analyzed by [10], via the notion of the\npseudo-dimension, but their results lead to sample complexity bounds of O( nk log(nk) log(1/\u0001)\n). Thus\nthe above simpler analysis removes the extra log factor on the dependence.\n\n\u00012\n\n7\n\n\fReferences\n[1] M. F. Balcan, A. Blum, J. D. Hartline, and Y. Mansour. Mechanism design via machine learning.\nIn 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS\u201905), pages\n605\u2013614, Oct 2005.\n\n[2] Maria-Florina F Balcan, Tuomas Sandholm, and Ellen Vitercik. Sample complexity of au-\ntomated mechanism design. In Advances in Neural Information Processing Systems, pages\n2083\u20132091, 2016.\n\n[3] Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson. Local rademacher complexities.\n\nAnn. Statist., 33(4):1497\u20131537, 08 2005.\n\n[4] Richard Cole and Tim Roughgarden. The sample complexity of revenue maximization. In 46th,\n\npages 243\u2013252. ACM, 2014.\n\n[5] Nikhil R. Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. The sample complexity of\nauctions with side information. In Proceedings of the Forty-eighth Annual ACM Symposium on\nTheory of Computing, STOC \u201916, pages 426\u2013439, New York, NY, USA, 2016. ACM.\n\n[6] Yannai A. Gonczarowski and Noam Nisan. Ef\ufb01cient empirical revenue maximization in single-\n\nparameter auction environments. CoRR, abs/1610.09976, 2016.\n\n[7] Jason D. Hartline and Tim Roughgarden. Simple versus optimal mechanisms. In Proceedings\nof the 10th ACM Conference on Electronic Commerce, EC \u201909, pages 225\u2013234, New York, NY,\nUSA, 2009. ACM.\n\n[8] Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold\n\nalgorithm. Machine learning, 2(4):285\u2013318, 1988.\n\n[9] Jamie Morgenstern and Tim Roughgarden. The pseudo-dimension of near-optimal auctions. In\nProceedings of the 28th International Conference on Neural Information Processing Systems,\nNIPS\u201915, pages 136\u2013144, Cambridge, MA, USA, 2015. MIT Press.\n\n[10] Jamie Morgenstern and Tim Roughgarden. Learning simple auctions. In COLT 2016, 2016.\n\n[11] Roger B Myerson. Optimal auction design. Mathematics of operations research, 6(1):58\u201373,\n\n1981.\n\n[12] D. Pollard. Convergence of Stochastic Processes. Springer Series in Statistics. 2011.\n\n[13] Tim Roughgarden and Okke Schrijvers. Ironing in the dark. In Proceedings of the 2016 ACM\nConference on Economics and Computation, EC \u201916, pages 1\u201318, New York, NY, USA, 2016.\nACM.\n\n[14] S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to Algo-\nrithms. Understanding Machine Learning: From Theory to Algorithms. Cambridge University\nPress, 2014.\n\n8\n\n\f", "award": [], "sourceid": 2770, "authors": [{"given_name": "Vasilis", "family_name": "Syrgkanis", "institution": "Microsoft Research"}]}