{"title": "Procrastinating with Confidence: Near-Optimal, Anytime, Adaptive Algorithm Configuration", "book": "Advances in Neural Information Processing Systems", "page_first": 8883, "page_last": 8893, "abstract": "Algorithm configuration methods optimize the performance of a parameterized heuristic algorithm on a given distribution of problem instances. Recent work introduced an algorithm configuration procedure (``Structured Procrastination'') that provably achieves near optimal performance with high probability and with nearly minimal runtime in the worst case. It also offers an anytime property: it keeps tightening its optimality guarantees the longer it is run. Unfortunately, Structured Procrastination is not adaptive to characteristics of the parameterized algorithm: it treats every input like the worst case. Follow-up work (``LeapsAndBounds'') achieves adaptivity but trades away the anytime property. This paper introduces a new algorithm, ``Structured Procrastination with Confidence'', that preserves the near-optimality and anytime properties of Structured Procrastination while adding adaptivity. In particular, the new algorithm will perform dramatically faster in settings where many algorithm configurations perform poorly. We show empirically both that such settings arise frequently in practice and that the anytime property is useful for finding good configurations quickly.", "full_text": "Procrastinating with Con\ufb01dence: Near-Optimal,\n\nAnytime, Adaptive Algorithm Con\ufb01guration\n\nRobert Kleinberg\n\nDepartment of Computer Science\n\nCornell University\n\nrdk@cs.cornell.edu\n\nBrendan Lucier\nMicrosoft Research\n\nbrlucier@microsoft.com\n\nKevin Leyton-Brown\n\nDepartment of Computer Science\nUniversity of British Columbia\n\nkevinlb@cs.ubc.ca\n\nDevon Graham\n\nDepartment of Computer Science\nUniversity of British Columbia\n\ndrgraham@cs.ubc.ca\n\nAbstract\n\nAlgorithm con\ufb01guration methods optimize the performance of a parameterized\nheuristic algorithm on a given distribution of problem instances. Recent work\nintroduced an algorithm con\ufb01guration procedure (\u201cStructured Procrastination\u201d)\nthat provably achieves near optimal performance with high probability and with\nnearly minimal runtime in the worst case. It also offers an anytime property: it keeps\ntightening its optimality guarantees the longer it is run. Unfortunately, Structured\nProcrastination is not adaptive to characteristics of the parameterized algorithm:\nit treats every input like the worst case. Follow-up work (\u201cLeapsAndBounds\u201d)\nachieves adaptivity but trades away the anytime property. This paper introduces a\nnew algorithm, \u201cStructured Procrastination with Con\ufb01dence\u201d, that preserves the\nnear-optimality and anytime properties of Structured Procrastination while adding\nadaptivity. In particular, the new algorithm will perform dramatically faster in\nsettings where many algorithm con\ufb01gurations perform poorly. We show empirically\nboth that such settings arise frequently in practice and that the anytime property is\nuseful for \ufb01nding good con\ufb01gurations quickly.\n\n1\n\nIntroduction\n\nAlgorithm con\ufb01guration is the task of searching a space of con\ufb01gurations of a given algorithm\n(typically represented as joint assignments to a set of algorithm parameters) in order to \ufb01nd a single\ncon\ufb01guration that optimizes a performance objective on a given distribution of inputs. In this paper,\nwe focus exclusively on the objective of minimizing average runtime. Considerable progress has\nrecently been made on solving this problem in practice via general-purpose, heuristic techniques such\nas ParamILS (Hutter et al., 2007, 2009), GGA (Ans\u00b4otegui et al., 2009, 2015), irace (Birattari et al.,\n2002; L\u00b4opez-Ib\u00b4a\u02dcnez et al., 2011) and SMAC (Hutter et al., 2011a,b). Notably, in the context of this\npaper, all these methods are adaptive: they surpass their worst-case performance when presented\nwith \u201ceasier\u201d search problems.\nRecently, algorithm con\ufb01guration has also begun to attract theoretical analysis. While there is a\nlarge body of less-closely related work that we survey in Section 1.3, the \ufb01rst nontrivial worst-case\nperformance guarantees for general algorithm con\ufb01guration with an average runtime minimization\nobjective were achieved by a recently introduced algorithm called Structured Procrastination (SP)\n(Kleinberg et al., 2017). This work considered a worst-case setting in which an adversary causes every\ndeterministic choice to play out as poorly as possible, but where observations of random variables are\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\funbiased samples. It is straightforward to argue that, in this setting, any \ufb01xed, deterministic heuristic\nfor searching the space of con\ufb01gurations can be extremely unhelpful. The work therefore focuses\non obtaining candidate con\ufb01gurations via random sampling (rather than, e.g., following gradients or\ntaking the advice of a response surface model). Besides its use of heuristics, SMAC also devotes half\nits runtime to random sampling. Any method based on random sampling will eventually encounter\nthe optimal con\ufb01guration; the crucial question is the amount of time that this will take. The key result\nof Kleinberg et al. (2017) is that SP is guaranteed to \ufb01nd a near-optimal con\ufb01guration with high\nprobability, with worst-case running time that nearly matches a lower bound on what is possible and\nthat asymptotically dominates that of existing alternatives such as SMAC.\nUnfortunately, there is a \ufb02y in the ointment: SP turns out to be impractical in many cases, taking an\nextremely long time to run even on inputs that existing methods \ufb01nd easy. At the root, the issue is\nthat SP treats every instance like the worst case, in which it is necessary to achieve a \ufb01ne-grained\nunderstanding of every con\ufb01guration\u2019s runtime in order to distinguish between them. For example, if\nevery con\ufb01guration is very similar but most are not quite \"-optimal, subtle performance differences\nmust be identi\ufb01ed. SP thus runs every con\ufb01guration enough times that with high probability the\ncon\ufb01guration\u2019s runtime can accurately be estimated to within a 1 + \" factor.\n\n1.1 LEAPSANDBOUNDS and CAPSANDRUNS\nWeisz et al. (2018b) introduced a new algorithm, LEAPSANDBOUNDS (LB), that improves upon\nStructured Procrastination in several ways. First, LB improves upon SP\u2019s worst-case performance,\nmatching its information-theoretic lower bound on running time by eliminating a log factor. Second,\nLB does not require the user to specify a runtime cap that they would never be willing to exceed on any\nrun, replacing this term in the analysis with the runtime of the optimal con\ufb01guration, which is typically\nmuch smaller. Third, and most relevant to our work here, LB includes an adaptive mechanism, which\ntakes advantage of the fact that when a con\ufb01guration exhibits low variance across instances, its\nperformance can be estimated accurately with a smaller number of samples. However, the easiest\nalgorithm con\ufb01guration problems are probably those in which a few con\ufb01gurations are much faster\non average than all other con\ufb01gurations. (Empirically, many algorithm con\ufb01guration instances exhibit\njust such non-worst-case behaviour; see our empirical investigation in the Supplementary Materials.)\nIn such cases, it is clearly unnecessary to obtain high-precision estimates of each bad con\ufb01guration\u2019s\nruntime; instead, we only need to separate these con\ufb01gurations\u2019 runtimes from that of the best\nalternative. LB offers no explicit mechanism for doing this. LB also has a key disadvantage when\ncompared to SP: it is not anytime, but instead must be given \ufb01xed values of \" and . Because LB is\nadaptive, there is no way for a user to anticipate the amount of time that will be required to prove\n(\", )-optimality, forcing a tradeoff between the risks of wasting available compute resources and of\nhaving to terminate LB before it returns an answer.\nCAPSANDRUNS (CR) is a re\ufb01nement of LB that was developed concurrently with the current paper;\nit has not been formally published, but was presented at an ICML 2018 workshop (Weisz et al., 2018a).\nCR maintains all of the bene\ufb01ts of LB, and furthermore introduces a second adaptive mechanism that\ndoes exploit variation in con\ufb01gurations\u2019 mean runtimes. Like LB, it is not anytime.\n\n1.2 Our Contributions\nOur main contribution is a re\ufb01ned version of SP that maintains the anytime property while aiming\nto observe only as many samples as necessary to separate the runtime of each con\ufb01guration from\nthat of the best alternative. We call it \u201cStructured Procrastination with Con\ufb01dence\u201d (SPC). SPC\ndiffers from SP in that it maintains a novel form of lower con\ufb01dence bound as an indicator of the\nquality of a particular con\ufb01guration, while SP simply uses that con\ufb01guration\u2019s sample mean. The\nconsequence is that SPC spends much less time running poorly performing con\ufb01gurations, as other\ncon\ufb01gurations quickly appear better and receive more attention. We initialize each lower bound with a\ntrivial value: each con\ufb01guration\u2019s runtime is bounded below by the fastest possible runtime, \uf8ff0. SPC\nthen repeatedly evaluates the con\ufb01guration that has the most promising lower bound.1 We perform\n1While both SPC and CR use con\ufb01dence bounds to guide search, they take different approaches. Rather\nthan rejecting con\ufb01gurations whose lower bounds get too large, SPC focuses on con\ufb01gurations with small lower\nbounds. By allocating a greater proportion of total runtime to such promising con\ufb01gurations we both improve\nthe bounds for con\ufb01gurations about which we are more uncertain and allot more resources to con\ufb01gurations\nwith relatively low mean runtimes about which we are more con\ufb01dent.\n\n2\n\n\fthese runs by \u201ccapping\u201d (censoring) runs at progressively doubling multiples of \uf8ff0. If a run does\nnot complete, SPC \u201cprocrastinates\u201d, deferring it until it has exhausted all runs with shorter captimes.\nEventually, SPC observes enough completed runs of some con\ufb01guration to obtain a nontrivial upper\nbound on its runtime. At this point, it is able to start drawing high-probability conclusions that other\ncon\ufb01gurations are worse.\nOur paper is focused on a theoretical analysis of SPC. We show that it identi\ufb01es an approximately\noptimal con\ufb01guration using running time that is nearly the best possible in the worst case; however,\nso does SP. The key difference, and the subject of our main theorem, is that SPC also exhibits\nnear-minimal runtime beyond the worst case, in the following sense. De\ufb01ne an (\", )-suboptimal\ncon\ufb01guration to be one whose average runtime exceeds that of the optimal con\ufb01guration by a factor\nof more than 1 + \", even when the suboptimal con\ufb01guration\u2019s runs are capped so that a fraction of\nthem fail to \ufb01nish within the time limit. A straightforward information-theoretic argument shows that\nin order to verify that a con\ufb01guration is (\", )-suboptimal it is suf\ufb01cient\u2014and may also be necessary,\nin the worst case\u2014to run it for O(\"2 \u00b7 1 \u00b7 OPT) time. The running time of SPC matches (up to\nlogarithmic factors) the running time of a hypothetical \u201coptimality veri\ufb01cation procedure\u201d that knows\nthe identity of the optimal con\ufb01guration, and for each suboptimal con\ufb01guration i knows a pair (\"i, i)\nsuch that i is (\"i, i)-suboptimal and the product \"2\nSPC is anytime in the sense that it \ufb01rst identi\ufb01es an (\", )-optimal con\ufb01guration for large values of\n\" and and then continues to re\ufb01ne these values as long as it is allowed to run. This is helpful for\nusers who have dif\ufb01culty setting these parameters up front, as already discussed. SPC\u2019s strategy for\nprogressing iteratively through smaller and smaller values of \" and also has another advantage: it is\nactually faster than starting with the \u201c\ufb01nal\u201d values of \" and and applying them to each con\ufb01guration.\nThis is because extremely weak con\ufb01gurations can be dismissed cheaply based on large (\", ) values,\ninstead of taking more samples to estimate their runtimes more \ufb01nely.\n\nis as small as possible.\n\ni\n\n\u00b7 1\n\ni\n\n1.3 Other Related Work\n\nThere is a large body of related work in the multi-armed bandits literature, which does not attack quite\nthe same problem but does similarly leverage the \u201coptimism in the face of uncertainty\u201d paradigm\nand many tools of analysis (Lai & Robbins, 1985; Auer et al., 2002; Bubeck et al., 2012). We do\nnot survey this work in detail as we have little to add to the extensive discussion by Kleinberg et al.\n(2017), but we brie\ufb02y identify some dominant threads in that work. Perhaps the greatest contact\nbetween the communities has occurred in the sphere of hyperparameter optimization (Bergstra et al.,\n2011; Thornton et al., 2013; Li et al., 2016) and in the literature on bandits with correlated arms\nthat scale to large experimental design settings (Kleinberg, 2006; Kleinberg et al., 2008; Chaudhuri\net al., 2009; Bubeck et al., 2011; Srinivas et al., 2012; Cesa-Bianchi & Lugosi, 2012; Munos, 2014;\nShahriari et al., 2016). In most of this literature, all arms have the same, \ufb01xed cost; others (Guha\n& Munagala, 2007; Tran-Thanh et al., 2012; Badanidiyuru et al., 2013) consider a model where\ncosts are variable but always paid in full. (Conversely, in algorithm con\ufb01guration we can stop runs\nthat exceed a captime, yielding a potentially censored sample at bounded cost.) Some in\ufb02uential\ndepartures from this paradigm include Kandasamy et al. (2016), Ganchev et al. (2010), and most\nnotably Li et al. (2016); reasons why these methods are nevertheless inappropriate for use in the\nalgorithm con\ufb01guration setting are discussed at length by Kleinberg et al. (2017).\nRecent work has examined the learning-theoretic foundations of algorithm con\ufb01guration, inspired\nin part by an in\ufb02uential paper of Gupta & Roughgarden (2017) that framed algorithm con\ufb01guration\nand algorithm selection in terms of learning theory. This vein of work has not aimed at a general-\npurpose algorithm con\ufb01guration procedure, as we do here, but has rather sought sample-ef\ufb01cient,\nspecial-purpose algorithms for particular classes of problems, including combinatorial partitioning\nproblems (clustering, max-cut, etc) (Balcan et al., 2017), branching strategies in tree search (Balcan\net al., 2018b), and various algorithm selection problems (Balcan et al., 2018a). Nevertheless, this\nvein of work takes a perspective similar to our own and demonstrates that algorithm con\ufb01guration has\nmoved decisively from being solely the province of heuristic methods to being a topic for rigorous\ntheoretical study.\n\n3\n\n\f2 Model\n\nWe de\ufb01ne an algorithm con\ufb01guration problem by the 4-tuple (N, , R,\uf8ff 0), where these elements are\nde\ufb01ned as follows. N is a family of (potentially randomized) algorithms, which we call con\ufb01gurations\nto suggest that a single piece of code instantiates each algorithm under a different parameter setting.\nWe do not assume that different con\ufb01gurations exhibit any sort of performance correlations, and can\nso capture the case of n distinct algorithms by imagining a \u201cmaster algorithm\u201d with a single, n-valued\ncategorical parameter. Parameters are allowed to take continuous values: |N| can be uncountable.\nWe typically use i to index con\ufb01gurations. is a probability distribution over input instances. When\nthe instance distribution is given implicitly by a \ufb01nite benchmark set, let be the uniform distribution\nover this set. We typically use j to index (input instance, random seed) pairs, to which we will\nhereafter refer simply as instances. R(i, j) is the execution time when con\ufb01guration i 2 N is run on\ninput instance j. Given some value of \u2713> 0, we de\ufb01ne R(i, j, \u2713) = min{R(i, j),\u2713 }, the runtime\ncapped at \u2713. \uf8ff0 > 0 is a constant such that R(i, j) \uf8ff0 for all con\ufb01gurations i and inputs j.\nFor any timeout threshold \u2713, let R\u2713(i) = Ej\u21e0[R(i, j, \u2713)] denote the average \u2713-capped running time\nof con\ufb01guration i, over input distribution . Fixing some running time \u00af\uf8ff = 2\uf8ff0 that we will never\nbe willing to exceed, the quantity R\u00af\uf8ff(i) corresponds to the expected running time of con\ufb01guration i\nand will be denoted simply by R(i). We will write OP T = mini R(i). Given \u270f> 0, a goal is to \ufb01nd\ni\u21e4 2 N such that R(i\u21e4) \uf8ff (1 + \u270f)OP T . We also consider a relaxed objective, where the running\ntime of i\u21e4 is capped at some threshold value \u2713 for some small fraction of (instance, seed) pairs .\nDe\ufb01nition 2.1. A con\ufb01guration i\u21e4 is (\u270f, )-optimal if there exists some threshold \u2713 such that R\u2713(i\u21e4) \uf8ff\n(1 + \u270f)OP T , and Prj\u21e0R(i\u21e4, j) >\u2713 \uf8ff . Otherwise, we say i\u21e4 is (\u270f, )-suboptimal.\n\n3 Structured Procrastination with Con\ufb01dence\n\nIn this section we present and analyze our algorithm con\ufb01guration procedure, which is based on the\n\u201cStructured Procrastination\u201d principle introduced in Kleinberg et al. (2017). We call the procedure\nSPC (Structured Procrastination with Con\ufb01dence) because, compared with the original Structured\nProcrastination algorithm, the main innovation is that instead of approximating the running time of\n\neach con\ufb01guration by taking eO(1/\"2) samples for some \", it approximates it using a lower con\ufb01dence\n\nbound that becomes progressively tighter as the number of samples increases. We focus on the case\nwhere N, the set of all con\ufb01gurations, is \ufb01nite and can be iterated over explicitly. Our main result for\nthis case is given as Theorem 3.4. In Section 4 we extend SPC to handle large or in\ufb01nite spaces of\ncon\ufb01gurations where full enumeration is impossible or impractical.\n\n3.1 Description of the algorithm\n\nThe algorithm is best described in terms of two components: a \u201cthread pool\u201d of subroutines called\ncon\ufb01guration testers, each tasked with testing one particular con\ufb01guration, and a scheduler that\ncontrols the allocation of time to the different con\ufb01guration testers. Because the algorithm is structured\nin this way, it lends itself well to parallelization, but in this section we will present and analyze it as a\nsequential algorithm.\nEach con\ufb01guration tester provides, at all times, a lower con\ufb01dence bound (LCB) on the average\nrunning time of its con\ufb01guration. The rule for computing the LCB will be speci\ufb01ed below; it is\ndesigned so that (with probability tending to 1 as time goes on) the LCB is less than or equal to the true\naverage running time. The scheduler runs a main loop whose iterations are numbered t = 1, 2, . . ..\nIn each iteration t, it polls all of the con\ufb01guration testers for their LCBs, selects the one with the\nminimum LCB, and passes control to that con\ufb01guration tester. The loop iteration ends when the tester\npasses control back to the scheduler. SPC is an anytime algorithm, so the scheduler\u2019s main loop is\nin\ufb01nite; if it is prompted to return a candidate con\ufb01guration at any time, the algorithm will poll each\ncon\ufb01guration tester for its \u201cscore\u201d (described below) and then output the con\ufb01guration whose tester\nreported the maximum score.\nThe way each con\ufb01guration tester i operates is best visualized as follows. There is an in\ufb01nite stream\nof i.i.d. random instances j1, j2, . . . that the tester processes. Each of them is either completed,\npending (meaning we ran the con\ufb01guration on that instance at least once, but it timed out before\ncompleting), or inactive. An instance that is completed or pending will be called active. Con\ufb01guration\n\n4\n\n\f// Main loop. Run until interrupted.\n\n// GetLCB() returns LCB as described in the text.\n\nAlgorithm 1: Structured Procrastination w/ Con\ufb01dence\n\nrequire :Set N of n algorithm con\ufb01gurations\nrequire :Lower bound on runtime, \uf8ff0\n// Initializations\n1 t := 0\n2 for i 2 N do\n\nCi := new Con\ufb01guration Tester for i\nCi.Initialize()\n\n5 repeat\ni := arg mini2N Ci.GetLCB()\n6\nCi.ExecuteStep()\n7\n8 until anytime search is interrupted\n9 return i\u21e4 = arg maxi2N {Ci.GetNumActive()}\n// Con\ufb01guration Testing Controller.\n10 Class ConfigurationTester()\n\ntester i maintains state variables \u2713i and ri such that the following invariants are satis\ufb01ed at all times:\n(1) the \ufb01rst ri instances in the stream are active and the rest are inactive; (2) the number of pending\ninstances is at most q = q(ri, t) = 50 log(t log ri); (3) every pending instance has been attempted\nwith timeout \u2713i, and no instance has been attempted with timeout greater than 2\u2713i. To maintain\nthese invariants, con\ufb01guration tester i maintains a queue of pending instances, each with a timeout\nparameter representing the timeout threshold to be used the next time the con\ufb01guration attempts\nto solve the instance. When the scheduler passes control to con\ufb01guration tester i, it either runs the\npending instance at the head of its queue (if the queue has q(ri, t) elements) or it selects an inactive\ninstance from the head of the i.i.d. stream and runs it with timeout threshold \u2713i. In both cases, if the\nrun exceeds its timeout, it is reinserted into the back of the queue with the timeout threshold doubled.\nAt any time, if con\ufb01guration tester\ni is asked to return a score (for the\npurpose of selecting a candidate opti-\nmal con\ufb01guration) it simply outputs\nri, the number of active instances.\nThe logic justifying this choice of\nscore function is that the scheduler\ndevotes more time to promising con-\n\ufb01gurations than to those that appear\nsuboptimal; furthermore, better con-\n\ufb01gurations run faster on average and\nso complete a greater number of runs.\nThis dual tendency of near-optimal\ncon\ufb01guration testers to be allocated\na greater amount of running time and\nto complete a greater number of runs\nper unit time makes the number of\nactive instances a strong indicator of\nthe quality of a con\ufb01guration, as we\nformalize in the analysis.\nWe must \ufb01nally specify how con\ufb01g-\nuration tester i computes its lower\ncon\ufb01dence bound on R(i); see Fig-\nure 1 for an illustration. Recall that\nthe con\ufb01guration tester has a state\nvariable \u2713i and that for every ac-\ntive instance j, the value R(i, j, \u2713i)\nis already known because i has ei-\nther completed instance j, or it has\nattempted instance j with timeout\nthreshold \u2713i. Given some iteration of\nthe algorithm, de\ufb01ne G to be the em-\npirical cumulative distribution func-\ntion (CDF) of R(i, j, \u2713i) as j ranges\nover all the active instances. A natu-\nral estimation of R\u2713i(i) would be the\nexpectation of this empirical distribu-\n0 (1 G(x))dx. Our lower\nbound will be the expectation of a\nmodi\ufb01ed CDF, found by scaling G\nnon-uniformly toward 1. To formally\ndescribe the modi\ufb01cation we require\nsome de\ufb01nitions. Here and through-\nout this paper, we use the notation\nlog(\u00b7) to denote the base-2 logarithm\nand ln(\u00b7) to denote the natural loga-\nrithm. Let \u270f(k, r, t) =q 9\u00b72k ln(kt)\n.\n\nrequire :Sequence j1, j2, . . . of instances\nrequire :Global iteration counter, t\nProcedure Initialize()\nr := 0, \u2713 := \uf8ff0, q = 1\nQ := empty double-ended queue\n\nif RUN(i, j`,\u2713 ) terminates in time \u2327 \uf8ff \u2713 then\nelse\n\nRi`\u2713 := \u2327\n\nRemove (`, \u27130) from head of Q\n\u2713 := \u27130\n\nt := t + 1\nif |Q| < q then\nr := r + 1\n` := r\n\nRi`\u2713 := \u2713\nInsert (`, 2\u2713) at tail of Q\n\nq := d25 log(t log r)e\n\nProcedure GetNumActive()\n\nreturn r\n\n3\n4\n\n11\n12\n13\n\n14\n15\n16\n17\n18\n19\n20\n21\n\n22\n23\n24\n25\n26\n\n27\n\n28\n29\n\nProcedure ExecuteStep()\n\n// Replenish queue\n\ntion,R 1\n\nelse\n\nr\n\n5\n\n\fFor 0 < p < 1 let\n\n(p, r, t) =(\n\np\n\n1+\u270f(blog(1/p)c,r,t)\n0\n\nif \u270f(blog(1/p)c, r, t) \uf8ff 1/2\notherwise.\n\n(1)\n\nGiven a function G : [0,1) ! [0, 1], we let\nL(G, r, t) =Z 1\n\n0\n\n(1 G(x), r, t) dx.\n\nThe con\ufb01guration tester\u2019s lower con\ufb01dence bound is L(Gi, ri, t), where t is the current iteration, ri\nis the number of active instances, and Gi is the empirical CDF of R(i, j, \u2713i).\nTo interpret this de\ufb01nition and Equation (1), think of p as the value of 1 G(x) for some x, and\n(p, r, t) \uf8ff p as a scaled-down version of p. The scaling factor we use, (1 + \u270f(k, r, t)), depends on\nthe value of p; speci\ufb01cally, it increases with k = blog(1/p)c. In other words, we scale G(x) more\naggressively as G(x) gets closer to 1. If p is too small as a function of r and t, then we give up\non scaling it and instead set it all the way to (p, r, t) = 0. To see this, note that for k such that\n2k \uf8ff p < 21k, if k is large enough then we will have that \u270f(k, r, t) > 1/2 so the second case of\nEquation (1) applies.\nWe also note that L(Gi, ri, t) can be explicitly computed. Observe that Gi(x) is actually a step\nfunction with at most ri steps and that Gi(x) = 1 for x >\u2713 i, so the integral de\ufb01ning L(Gi, ri, t)\nis actually a \ufb01nite sum that can be computed in O(ri) time, given a sorted list of the elements of\n{R(i, j, \u2713i) | j active}. Example 3.1 illustrates the gains SPC can offer over SP.\nExample 3.1. Suppose that there are two con\ufb01gurations: one that takes 100ms on every input and\nanother that takes 1000ms. With \uf8ff0 = 1ms, \u270f = 0.01, and \u21e3 = 0.1, SP will set the initial queue size\nof each con\ufb01guration to be at least2 7500, because the queue size is initialized with a value that is at\nleast 12\u270f2 ln(3n/\u21e3). It will run each con\ufb01guration 7500 times with a timeout of 1ms, then it will\nrun each of them 7500 times with a timeout of 2ms, then 4ms, and so on, until it reaches 128ms. At\nthat point it exceeds 100ms, so the \ufb01rst con\ufb01guration will solve all instances in its queue. However,\nfor the \ufb01rst 2\u00b7 7500\u00b7 (1 + 2 + 4 +\u00b7\u00b7\u00b7 + 64) = 1.9\u21e5 106 milliseconds of running the algorithm\u2014more\nthan half an hour\u2014essentially nothing happens: SP obtains no evidence of the superiority of the \ufb01rst\ncon\ufb01guration.\nIn contrast, SPC maintians more modest queue sizes, and thus runs each con\ufb01guration on fewer\ninstances before running them with a timeout of 128ms, at which point it can distinguish between\nthe two. During the \ufb01rst 5000 iterations of SPC, the size of each con\ufb01guration\u2019s instance queue is\nat most 400. This is because ri \uf8ff t, and t \uf8ff 5000, so qi \uf8ff 25 log(5000 log(5000)) < 400. Further,\nobserve that 5000 iterations is suf\ufb01cient for SPC to attempt to run both con\ufb01gurations on some\ninstance with a cutoff of 128ms, since each con\ufb01guration will \ufb01rst run at most 400 instances with\ncutoff 1ms, then at most 400 instances with cutoff 2ms, and so on. Continuing up to 64ms, for both\ncon\ufb01gurations, takes a total of 2 \u00b7 log(64) \u00b7 400 = 4800 < 5000 iterations. Thus, it takes at most\n2\u00b7 400\u00b7 (1 + 2 + 4 + ... + 64) = 101, 600 milliseconds (less than two minutes) before SPC runs each\ncon\ufb01guration on some instance with cutoff time 128ms. We see that SPC requires signi\ufb01cantly less\ntime\u2014in this example, almost a factor of 20 less\u2014to reach the point where it can distinguish between\nthe two con\ufb01gurations.\n\nJusti\ufb01cation of lower con\ufb01dence bound\n\n3.2\nIn this section we will show that for any con\ufb01guration i and any iteration t, with probability 1 \nO(t5/4) the inequality L(Gi, ri, t) \uf8ff R(i) holds. Let Fi denote the cumulative distribution function\nof the running time of con\ufb01guration i. Then R(i) = R 1\n0 1 Fi(x) dx, so in order to prove that\nL(Gi, ri, t) \uf8ff R(i) with high probability it suf\ufb01ces to prove that, with high probability, for all x\nthe inequality (1 Gi(x), ri, t) \uf8ff 1 Fi(x) holds. To do so we will apply a multiplicative error\nestimate from empirical process theory due to Wellner (1978). This error estimate can be used to\nderive the following error bound in our setting.\nLemma 3.2. Let x1, . . . , xn be independent random samples from a distribution with cumulative\ndistribution function F , and G their empirical CDF. For 0 \uf8ff b \uf8ff 1, x 0, and 0 \uf8ff \" \uf8ff 1/2\n2The exact queue size depends on the number of active instances, but this bound suf\ufb01ces for our example.\n\n6\n\n\fFigure1:Anillustrationofhowwecomputethelowerboundonacon\ufb01gu-ration\u2019saverageruntime.Thedistribu-tionofagivencon\ufb01guration\u2019struerun-timeisF(x);theempiricalCDF,G(x),constitutesobservationssampledfromF(x)andcensoredat\u2713.Thecon\ufb01gura-tion\u2019sexpectedruntime,thequantitywewanttoestimate,isthe(blue)shadedregionabovecurveF(x).Ourhigh-probabilitylowerboundonthisquantityisthe(green)areaaboveG(x),scaledtowards1asdescribedinEquation(1).Runtime\uf8ff0\u2713Probabilityofsolvinganinstance01/23/47/81F(x)G(x)de\ufb01netheeventsE1(b,x)={1G(x)b}andE2(\u270f,x)=1G(x)1+\">1F(x) .ThenwehavePr(9xs.t.E1(b,x)andE2(\u270f,x))\uf8ffexp(14\"2nb).TojustifytheuseofL(Gi,ri,t)asalowercon\ufb01denceboundonR(i),weapplyLemma3.2withb=2k,n=rand\"=\"(k,r,t).Withtheseparameters,14\"2nb=94ln(kt),hencethelemmaimpliesthefollowingforallk,r,t:Pr9xs.t.E1(2k,x)andE2(\"(k,r,t),x)\uf8ff(kt)9/4.(2)TheinequalityisusedinthefollowingpropositiontoshowthatL(Gi,ri,t)isalowerboundonR(i)withhighprobability.Lemma3.3.Foreachcon\ufb01gurationtester,i,andeachloopiterationt,Pr(9xs.t.(1Gi(x),ri,t)>1Fi(x))=O(t5/4).(3)ConsequentlyPr(L(Gi,ri,t)>R(i))=O(t5/4).3.3RunningtimeanalysisSinceSPCspendslesstimerunningbadcon\ufb01gurations,weareabletoshowanimprovedruntimeboundoverSP.Supposethatiis(\",)-suboptimal.Weboundtheexpectedamountoftimedevotedtorunningiduringthe\ufb01rsttloopiterations.WeshowthatthisquantityisO(\"21log(tlog(1/))).Summingover(\",)-suboptimalcon\ufb01gurationsyieldsourmainresult,whichisthatAlgorithm1isextremelyunlikelytoreturnan(\u270f,)-suboptimalcon\ufb01gurationonceitsruntimeexceedstheaverageruntimeofthebestcon\ufb01gurationbyagivenfactor.WriteB(t,\",)=\"21log(tlog(1/)).Theorem3.4.Fix\"andandletSbethesetof(\",)-optimalcon\ufb01gurations.Foreachi62Ssupposethatiis(\"i,i)-suboptimal,with\"i\"andi.ThenifthetimespentrunningSPCis\u2326\u2713R(i\u21e4)\u2713|S|\u00b7B(t,\",)+Xi62SB(t,\"i,i)\u25c6\u25c6,wherei\u21e4denotesanoptimalcon\ufb01guration,thenSPCwillreturnan(\",)-optimalcon\ufb01gurationwhenitisterminated,withhighprobabilityint.RatherthanhavinganadditiveO(\u270f21)termforeachofncon\ufb01gurationsconsidered(asisthecasewithSP),theboundinTheorem3.4hasatermoftheformO(\u270f2i1i),foreachcon\ufb01gurationithatisnot(\u270f,)-optimal,where\u270f2i1iisassmallaspossible.Thiscanbeasigni\ufb01cantimprovementincaseswheremanycon\ufb01gurationsbeingconsideredarefarfrombeing(\u270f,)-optimal.ToproveTheorem3.4,wewillmakeuseofthefollowinglemma,whichboundsthetimespentrunningcon\ufb01gurationiintermsofitslowercon\ufb01denceboundandnumberofactiveinstances.Lemma3.5.Atanytime,ifthecon\ufb01gurationtesterforcon\ufb01gurationihasriactiveinstancesandlowercon\ufb01denceboundLi,thenthetotalamountofrunningtimethathasbeenspentrunningcon\ufb01gurationiisatmost9riLi.Theintuitionisthatbecauseexecutiontimeoutsaresuccessivelydoubled,thetotaltimespentrunningonagiveninputinstancejisnotmuchmorethanthetimeofthemostrecentexecutiononj.Butif7\fi\n\ni 1\n\ni\n\n0 (1 Gi(x)) dx.\n\nlog(t log(1/i))) where i\u21e4 denotes an optimal con\ufb01guration.\n\nwe take an average over all active j, the total time spent on the most recent runs is precisely r times\nthe average runtime under the empirical CDF. The result then follows from the following lemma,\nLemma 3.6, which shows that Li is at least a constant times this empirical average runtime.\nLemma 3.6. At any iteration t, if the con\ufb01guration tester for con\ufb01guration i has ri active instances\n3R \u2713i\nand Gi is the empirical CDF for R(i, j, \u2713i), then L(Gi, ri, t) 2\nGiven Lemma 3.5, it suf\ufb01ces to argue that a suf\ufb01ciently suboptimal con\ufb01guration will have few active\ninstances. This is captured by the following lemma.\nLemma 3.7. If con\ufb01guration i is (\"i, i)-suboptimal then at any iteration t, the expected number\nof active instances for con\ufb01guration tester i is bounded by O(\"2\nlog(t log(1/i))) and the\nexpected amount of time spent running con\ufb01guration i on those instances is bounded by O(R(i\u21e4) \u00b7\ni 1\n\"2\nIntuitively, Lemma 3.7 follows because in order for the algorithm to select a suboptimal con\ufb01guration\ni, it must be that the lower bound for i is less than the lower bound for an optimal con\ufb01guration.\nSince the lower bounds are valid with high probability, this can only happen if the lower bound\nfor con\ufb01guration i is not yet very tight. Indeed, it must be signi\ufb01cantly less than R(i) for some\nthreshold with Prj(R(i, j) > ) i. However, the lower bound cannot remain this loose for\nlong: once the threshold \u2713 gets large enough relative to , and we take suf\ufb01ciently many samples as a\nfunction of \u270fi and i, standard concentration bounds will imply that the empirical CDF (and hence our\nlower bound) will approximate the true runtime distribution over the range [0, ]. Once this happens,\nthe lower bound will exceed the average runtime of the optimal distribution, and con\ufb01guration i will\nstop receiving time from the scheduler.\nLemma 3.7 also gives us a way of determining \u270f and from an empirical run of SPC. If SPC returns\ncon\ufb01guration i at time t, then by Lemma 3.7 i will not be (\u270f, )-suboptimal for any \u270f and for which\nri =\u2326( \u270f21 log(t log(1/))), where ri is the number of active instances for i at termination time.\nThus, given a choice of \u270f and the value of ri at termination, one can solve to determine a for which\ni is guaranteed to be (\u270f, )-optimal. See Appendix E for further details.\nGiven Lemma 3.7, Theorem 3.4 follows from a straightforward counting argument; see Appendix B.\n\n4 Handling Many Con\ufb01gurations\n\nAlgorithm 1 assumes a \ufb01xed set N of n possible con\ufb01gurations. In practice, these con\ufb01gurations are\noften determined by the settings of dozens or even hundreds of parameters, some of which might\nhave continuous domains. In these cases, it is not practical for the search procedure to take time\nproportional to the number of all possible con\ufb01gurations. However, like Structured Procrastination,\nthe SPC procedure can be modi\ufb01ed to handle such cases. What follows is a brief discussion; due to\nspace constraints, the details are provided in the supplementary material.\nThe \ufb01rst idea is to sample a set \u02c6N of n con\ufb01gurations from the large (or in\ufb01nite) pool, and run\nAlgorithm 1 on the sampled set. This yields an (\u270f, )-optimality guarantee with respect to the best\ncon\ufb01guration in \u02c6N. Assuming the samples are representative, this corresponds to the top (1/n)\u2019th\nquantile of runtimes over all con\ufb01gurations. We can then imagine running instances of SPC in\nparallel with successively doubled sample sizes, appropriately weighted, so that we make progress on\nestimating the top (1/2k)\u2019th quantile simultaneously for each k. This ultimately leads to an extension\nof Theorem 3.4 in which, for any > 0, one obtains a con\ufb01guration that is (\u270f, )-optimal with respect\nto OPT, the top -quantile of con\ufb01guration runtimes. This method is anytime, and the time required\nfor a given \u270f, , and is (up to log factors) OPT \u00b7 1\n times the expected minimum time needed to\ndetermine whether a randomly chosen con\ufb01guration is (\u270f, )-suboptimal relative to OPT.\n\n5 Experimental Results\n\nWe experiment3 with SPC on the benchmark set of runtimes generated by Weisz et al. (2018b) for\ntesting LEAPSANDBOUNDS. This data consists of pre-computed runtimes for 972 con\ufb01gurations\n\n3Code to reproduce experiments is available at https://github.com/drgrhm/alg_config\n\n8\n\n\fFigure 2: Mean runtimes for\nsolutions returned by SPC af-\nter various amounts of compute\ntime (blue line), and for those\nreturned by LB for different \u270f, \npairs (red points). For LB, each\npoint represents a different \u270f, \ncombination. Its size represents\nthe value of \u270f, and its color in-\ntensity represents the value of .\nSPC is able to \ufb01nd a good solu-\ntion relatively quickly. Different\n\u270f, pairs can lead to drastically\ndifferent runtimes, while still re-\nturning the same con\ufb01guration.\nThe x-axis is in log scale.\n\nof the minisat (Sorensson & Een, 2005) SAT solver on 20118 SAT instances generated using\nCNFuzzDD4. A key difference between SPC and LB is the former\u2019s anytime guarantee: unlike with\nLB, users need not choose values of \u270f or in advance. Our experiments investigate the impact of\nthis property. To avoid con\ufb02ating the results with effects due to restarts and their interaction with the\nmultiplier of \u2713, all the times we considered were for the non-resuming simulated environment.\nFigure 2 compares the solutions returned by SPC after various amounts of CPU compute time with\nthose of LB and SP for different \u270f, pairs chosen from a grid with \u270f 2 [0.1, 0.9] and 2 [0.1, 0.5].\nThe x-axis measures CPU time in days, and the y-axis shows the expected runtime of the solution\nreturned (capping at the dataset\u2019s max cap of 900s). The blue line shows the result of SPC over time.\nThe red points show the result of LB for different \u270f, pairs, and the green points show this result for\nSP. The size of each point is proportional to \u270f, while the color is proportional to .\nWe draw two main conclusions from Figure 2. First, SPC was able to \ufb01nd a reasonable solution after\na much smaller amount of compute time than LB. After only about 10 CPU days, SPC identi\ufb01ed a\ncon\ufb01guration that was in the top 1% of all con\ufb01gurations in terms of max-capped runtime, while runs\nof LB took at least 100 CPU days for every \u270f, combination we considered. Second, choosing a\ngood \u270f, combination for LB was not easy. One might expect that big, dark points would appear at\nshorter runtimes, while smaller, lighter ones would appear at higher runtimes. However, this was not\nthe case. Instead, we see that different \u270f, pairs led to drastically different total runtimes, often while\nstill returning the same con\ufb01guration. Conversely, SPC lets the user completely avoid this problem.\nIt settles on a fairly good con\ufb01guration after about 100 CPU days. If the user has a few hundred more\nCPU days to spare, they can continue to run SPC and eventually obtain the best solution reached by\nLB, and then to the dataset\u2019s true optimal value after about 525 CPU days. However, even at this time\nscale many \u270f, pairs led to worse con\ufb01gurations being returned by LB than SPC.\n\n6 Conclusion\n\nWe have presented Structured Procrastination with Con\ufb01dence, an approximately optimal procedure\nfor algorithm con\ufb01guration. SPC is an anytime algorithm that uses a novel lower con\ufb01dence bound\nto select con\ufb01gurations to explore, rather than a sample mean. As a result, SPC adapts to problem\ninstances in which it is easier to discard poorly-performing con\ufb01gurations. We are thus able to show\nan improved runtime bound for SPC over SP, while maintaining the anytime property of SP.\nWe compare SPC to other con\ufb01guration procedures on a simple benchmark set of SAT solver runtimes,\nand show that SPC\u2019s anytime property can be helpful in \ufb01nding good con\ufb01gurations, especially early\non in the search process. However, a more comprehensive empirical investigation is needed, in\nparticular in the setting of many con\ufb01gurations. Such large-scale experiments will be a signi\ufb01cant\nengineering challenge, and we leave this avenue to future work.\n\n4http://fmv.jku.at/cnfuzzdd/\n\n9\n\n\fReferences\nAns\u00b4otegui, C., Sellmann, M., and Tierney, K. A gender-based genetic algorithm for automatic\ncon\ufb01guration of algorithms. In Principles and Practice of Constraint Programming (CP), pp.\n142\u2013157, 2009.\n\nAns\u00b4otegui, C., Malitsky, Y., Sellmann, M., and Tierney, K. Model-based genetic algorithms for\nalgorithm con\ufb01guration. In International Joint Conference on Arti\ufb01cial Intelligence (IJCAI), pp.\n733\u2013739, 2015.\n\nAuer, P., Cesa-Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem.\n\nMachine learning, 47(2-3):235\u2013256, 2002.\n\nBadanidiyuru, A., Kleinberg, R., and Slivkins, A. Bandits with knapsacks.\n\nComputer Science (FOCS), pp. 207\u2013216, 2013.\n\nIn Foundations of\n\nBalcan, M., Dick, T., and Vitercik, E. Dispersion for data-driven algorithm design, online learning,\n\nand private optimization. In Proc. IEEE FOCS, pp. 603\u2013614, 2018a.\n\nBalcan, M.-F., Nagarajan, V., Vitercik, E., and White, C. Learning-theoretic foundations of algorithm\ncon\ufb01guration for combinatorial partitioning problems. In Conference on Learning Theory, pp.\n213\u2013274, 2017.\n\nBalcan, M.-F., Dick, T., Sandholm, T., and Vitercik, E. Learning to branch. International Conference\n\non Machine Learning, 2018b.\n\nBergstra, J. S., Bardenet, R., Bengio, Y., and K\u00b4egl, B. Algorithms for hyper-parameter optimization.\n\nIn Advances in Neural Information Processing Systems (NIPS), pp. 2546\u20132554, 2011.\n\nBirattari, M., Sttzle, T., Paquete, L., and Varrentrapp, K. A racing algorithm for con\ufb01guring\nmetaheuristics. In Genetic and Evolutionary Computation Conference (GECCO), pp. 11\u201318, 2002.\nBubeck, S., Munos, R., Stoltz, G., and Szepesv\u00b4ari, C. X-armed bandits. Journal of Machine Learning\n\nResearch, 12(May):1655\u20131695, 2011.\n\nBubeck, S., Cesa-Bianchi, N., et al. Regret analysis of stochastic and nonstochastic multi-armed\n\nbandit problems. Foundations and Trends in Machine Learning, 5(1):1\u2013122, 2012.\n\nCesa-Bianchi, N. and Lugosi, G. Combinatorial bandits. Journal of Computer and System Sciences,\n\n78(5):1404\u20131422, 2012.\n\nChaudhuri, K., Freund, Y., and Hsu, D. J. A parameter-free hedging algorithm. In Advances in\n\nNeural Information Processing Systems (NIPS), pp. 297\u2013305, 2009.\n\nGanchev, K., Nevmyvaka, Y., Kearns, M., and Vaughan, J. W. Censored exploration and the dark\n\npool problem. Communications of the ACM, 53(5):99\u2013107, 2010.\n\nGuha, S. and Munagala, K. Approximation algorithms for budgeted learning problems. In ACM\n\nSymposium on Theory of Computing (STOC), pp. 104\u2013113, 2007.\n\nGupta, R. and Roughgarden, T. A PAC approach to application-speci\ufb01c algorithm selection. SIAM\n\nJournal on Computing, 46(3):992\u20131017, 2017.\n\nHutter, F., Hoos, H., and St\u00a8utzle, T. Automatic algorithm con\ufb01guration based on local search. In\n\nAAAI Conference on Arti\ufb01cial Intelligence, pp. 1152\u20131157, 2007.\n\nHutter, F., Hoos, H., Leyton-Brown, K., and St\u00a8utzle, T. ParamILS: An automatic algorithm con\ufb01gura-\n\ntion framework. Journal of Arti\ufb01cial Intelligence Research, 36:267\u2013306, 2009.\n\nHutter, F., Hoos, H., and Leyton-Brown, K. Bayesian optimization with censored response data.\nIn NIPS workshop on Bayesian Optimization, Sequential Experimental Design, and Bandits\n(BayesOpt\u201911), 2011a.\n\nHutter, F., Hoos, H., and Leyton-Brown, K. Sequential model-based optimization for general\nalgorithm con\ufb01guration. In Conference on Learning and Intelligent Optimization (LION), pp.\n507\u2013523, 2011b.\n\n10\n\n\fHutter, F., Xu, L., Hoos, H. H., and Leyton-Brown, K. Algorithm runtime prediction: Methods &\n\nevaluation. Arti\ufb01cial Intelligence, 206:79\u2013111, 2014.\n\nKandasamy, K., Dasarathy, G., Poczos, B., and Schneider, J. The multi-\ufb01delity multi-armed bandit.\n\nIn Advances in Neural Information Processing Systems (NIPS), pp. 1777\u20131785, 2016.\n\nKleinberg, R. Anytime algorithms for multi-armed bandit problems. In ACM-SIAM Symposium on\n\nDiscrete Algorithms (SODA), pp. 928\u2013936, 2006.\n\nKleinberg, R., Slivkins, A., and Upfal, E. Multi-armed bandits in metric spaces. In ACM Symposium\n\non Theory of Computing, pp. 681\u2013690, 2008.\n\nKleinberg, R., Leyton-Brown, K., and Lucier, B. Ef\ufb01ciency through procrastination: Approximately\noptimal algorithm con\ufb01guration with runtime guarantees. In Proceedings of the 26th International\nJoint Conference on Arti\ufb01cial Intelligence (IJCAI), 2017.\n\nLai, T. L. and Robbins, H. Asymptotically ef\ufb01cient adaptive allocation rules. Advances in Applied\n\nMathematics, 6(1):4\u201322, 1985.\n\nLi, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. Hyperband: A novel\nbandit-based approach to hyperparameter optimization. arXiv preprint arXiv:1603.06560, 2016.\nL\u00b4opez-Ib\u00b4a\u02dcnez, M., Dubois-Lacoste, J., St\u00a8utzle, T., and Birattari, M. The irace package, iterated race\nfor automatic algorithm con\ufb01guration. Technical report, IRIDIA, Universit\u00b4e Libre de Bruxelles,\n2011. URL http://iridia.ulb.ac.be/IridiaTrSeries/IridiaTr2011-004.pdf.\n\nMunos, R. From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization\n\nand planning. Foundations and Trends in Machine Learning, 7(1):1\u2013129, 2014.\n\nShahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. Taking the human out of the\n\nloop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148\u2013175, 2016.\n\nSorensson, N. and Een, N. Minisat v1. 13-a sat solver with con\ufb02ict-clause minimization. SAT, 2005\n\n(53):1\u20132, 2005.\n\nSrinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. Information-theoretic regret bounds for\nGaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58\n(5):3250\u20133265, 2012.\n\nThornton, C., Hutter, F., Hoos, H., and Leyton-Brown, K. Auto-WEKA: Combined selection and\nhyperparameter optimization of classi\ufb01cation algorithms. In Conference on Knowledge Discovery\nand Data mining (KDD), pp. 847\u2013855, 2013.\n\nTran-Thanh, L., Chapman, A., Rogers, A., and Jennings, N. R. Knapsack based optimal policies for\n\nbudget\u2013limited multi\u2013armed bandits. In AAAI Conference on Arti\ufb01cial Intelligence, 2012.\n\nWeisz, G., Gy\u00a8ogy, A., and Szepesv\u00b4ari, C. CAPSANDRUNS: An improved method for approximately\n\noptimal algorithm con\ufb01guration. ICML 2018 AutoML Workshop, 2018a.\n\nWeisz, G., Gy\u00a8ogy, A., and Szepesv\u00b4ari, C. LEAPSANDBOUNDS: A method for approximately optimal\nalgorithm con\ufb01guration. In International Conference on Machine Learning, pp. 5254\u20135262, 2018b.\nWellner, J. A. Limit theorems for the ratio of the empirical distribution function to the true distribution\n\nfunction. Z. Wahrscheinlichkeitstheorie verw. Gebeite, 45:73\u201388, 1978.\n\n11\n\n\f", "award": [], "sourceid": 4775, "authors": [{"given_name": "Robert", "family_name": "Kleinberg", "institution": "Cornell University"}, {"given_name": "Kevin", "family_name": "Leyton-Brown", "institution": "University of British Columbia"}, {"given_name": "Brendan", "family_name": "Lucier", "institution": "Microsoft Research"}, {"given_name": "Devon", "family_name": "Graham", "institution": "University of British Columbia"}]}