{"title": "On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 1036, "page_last": 1044, "abstract": "This paper discusses how to efficiently choose from $n$ unknowndistributions the $k$ ones whose means are the greatest by a certainmetric, up to a small relative error. We study the topic under twostandard settings---multi-armed bandits and hidden bipartitegraphs---which differ in the nature of the input distributions. In theformer setting, each distribution can be sampled (in the i.i.d.manner) an arbitrary number of times, whereas in the latter, eachdistribution is defined on a population of a finite size $m$ (andhence, is fully revealed after $m$ samples). For both settings, weprove lower bounds on the total number of samples needed, and proposeoptimal algorithms whose sample complexities match those lower bounds.", "full_text": "On Top-k Selection in Multi-Armed Bandits and\n\nHidden Bipartite Graphs\n\nWei Cao1\n\nJian Li1\n1Tsinghua University\n\n1{cao-w13@mails, lijian83@mail, zz-li14@mails}.tsinghua.edu.cn\n\n2taoyf@cse.cuhk.edu.hk\n\nYufei Tao2\n\nZhize Li1\n\n2Chinese University of Hong Kong\n\nAbstract\n\nThis paper discusses how to ef\ufb01ciently choose from n unknown distributions the k\nones whose means are the greatest by a certain metric, up to a small relative error.\nWe study the topic under two standard settings\u2014multi-armed bandits and hidden\nbipartite graphs\u2014which differ in the nature of the input distributions. In the for-\nmer setting, each distribution can be sampled (in the i.i.d. manner) an arbitrary\nnumber of times, whereas in the latter, each distribution is de\ufb01ned on a population\nof a \ufb01nite size m (and hence, is fully revealed after m samples). For both set-\ntings, we prove lower bounds on the total number of samples needed, and propose\noptimal algorithms whose sample complexities match those lower bounds.\n\nIntroduction\n\n1\nThis paper studies a class of problems that share a common high-level objective: from a number n\nof probabilistic distributions, \ufb01nd the k ones whose means are the greatest by a certain metric.\nCrowdsourcing. A crowdsourcing algorithm (see recent works [1, 13] and the references therein)\nsummons a certain number, say k, of individuals, called workers, to collaboratively accomplish\na complex task. Typically, the algorithm breaks the task into a potentially very large number of\nmicro-tasks, each of which makes a binary decision (yes or no) by taking the majority vote from the\nparticipating workers. Each worker is given an (often monetary) reward for every micro-task that\ns/he participates in. It is therefore crucial to identify the most reliable workers that have the highest\nrates of making correct decisions. Because of this, a crowdsourcing algorithm should ideally be\npreceded by an exploration phase, which selects the best k workers from n candidates by a series of\n\u201ccontrol questions\u201d. Every control-question must be paid for in the same way as a micro-task. The\nchallenge is to \ufb01nd the best workers with the least amount of money.\nFrequent Pattern Discovery. Let B and W be two relations. Given a join predicate Q(b, w), the\njoining power of a tuple b \u2208 B equals the number of tuples w \u2208 W such that b and w satisfy Q. A\ntop-k semi-join [14, 17] returns the k tuples in B with the greatest joining power. This type of semi-\njoins is notoriously dif\ufb01cult to process when the evaluation of Q is complicated, and thus unfriendly\nto tailored-made optimization. A well-known example from graph databases is the discovery of\nfrequent patterns [14], where B is a set of graph patterns, W a set of data graphs, and Q(b, w)\ndecides if a pattern b is a subgraph of a data graph w. In this case, top-k semi-join essentially returns\nthe set of k graph patterns most frequently found in the data graphs. Given a black box for resolving\nsubgraph isomorphism Q(b, w), the challenge is to minimize the number of calls to the black box.\nWe refer to the reader to [14, 15] for more examples of dif\ufb01cult top-k semi-joins of this sort.\n1.1 Problem Formulation\nThe paper studies four problems that capture the essence of the above applications.\nMulti-Armed Bandit. We consider a standard setting of stochastic multi-armed bandit selection.\nSpeci\ufb01cally, there is a bandit with a set B of n arms, where the i-th arm is associated with a Bernoulli\n\n1\n\n\fdistribution with an unknown mean \u03b8i \u2208 (0, 1]. In each round, we choose an arm, pull it, and then\ncollect a reward, which is an i.i.d. sample from the arm\u2019s reward distribution.\nGiven a subset V \u2286 B of arms, we denote by ai(V ) the arm with the i-th largest mean in V , and\nby \u03b8i(V ) the mean of ai(V ). De\ufb01ne \u03b8avg(V ) = 1\ni=1 \u03b8i(V ), namely, the average of the means of\nk\nthe top-k arms in V .\nOur \ufb01rst two problems aim to identify k arms whose means are the greatest either individually or\naggregatively:\nn/2, we want to select a k-sized subset V of B such that, with probability at least 1\u2212 \u03b4, it holds that\n\nProblem 1 [Top-k Arm Selection (k-AS)] Given parameters \u0001 \u2208 (cid:0)0, 1\n\n(cid:1) , \u03b4 \u2208 (cid:0)0, 1\n\n(cid:1), and k \u2264\n\n(cid:80)k\n\n48\n\n4\n\n\u03b8i(V ) \u2265 (1 \u2212 \u0001)\u03b8i(B), \u2200i \u2264 k.\n\nWe further study a variation of k-AS where we change the multiplicative guarantee \u03b8i(V ) \u2265 (1 \u2212\n\u0001)\u03b8i(B) to an additive guarantee \u03b8i(V ) \u2265 \u03b8i(B) \u2212 \u0001(cid:48). We refer to the modi\ufb01ed problem as Top-\nkadd Arm Selection(kadd-AS). Due to the space constraint, we present all the details of kadd-AS in\nAppendix C.\nProblem 2 [Top-kavg Arm Selection (kavg-AS)] Given the same parameters as in k-AS, we want\nto select a k-sized subset V of B such that, with probability at least 1 \u2212 \u03b4, it holds that\n\n\u03b8avg(V ) \u2265 (1 \u2212 \u0001)\u03b8avg(B).\n\nFor both problems, the cost of an algorithm is the total number of arms pulled, or equivalently, the\ntotal number of samples drawn from the arms\u2019 distributions. For this reason, we refer to the cost\nas the algorithm\u2019s sample complexity. It is easy to see that k-AS is more stringent than kavg-AS;\nhence, a feasible solution to the former is also a feasible solution to the latter, but not the vice versa.\nHidden Bipartite Graph. The second main focus of the paper is the exploration of hidden bipartite\ngraphs. Let G = (B, W, E) be a bipartite graph, where the nodes in B are colored black, and those\nin W colored white. Set n = |B| and m = |W|. The edge set E is hidden in the sense that an\nalgorithm does not see any edge at the beginning. To \ufb01nd out whether an edge exists between a\nblack vertex b and a white vertex w, the algorithm must perform a probe operation. The cost of the\nalgorithm equals the number of such operations performed.\nIf an edge exists between b and w, we say that there is a solid edge between them; otherwise,\nwe say that they have an empty edge. Let deg(b) be the degree of a black vertex b, namely, the\nnumber of solid edges of b. Given a subset of black vertices V \u2286 B, we denote by bi(V ) the\nblack vertex with i-th largest degree in V , and by degi(V ) the degree of bi(V ). Furthermore, de\ufb01ne\ndegavg(V ) = 1\nk\nWe now state the other two problems studied in this work, which aim to identify k black vertices\nwhose degrees are the greatest either individually or aggregatively:\n\nProblem 3 [k-Most Connected Vertex [14] (k-MCV)] Given parameters \u0001 \u2208(cid:0)0, 1\n\nand k \u2264 n/2, we want to select a k-sized subset V of B such that, with probability at least 1 \u2212 \u03b4, it\nholds that\n\n(cid:1) , \u03b4 \u2208(cid:0)0, 1\n\ni=1 degi(V ).\n\n(cid:80)k\n\n(cid:1),\n\ndegi(V ) \u2265 (1 \u2212 \u0001) degi(B), \u2200i \u2264 k.\n\nProblem 4 [kavg-Most Connected Vertex (kavg-MCV)] Given the same parameters as in k-MCV,\nwe want to select a k-sized subset V of B such that, with probability at least 1 \u2212 \u03b4, it holds that\n\n4\n\n48\n\ndegavg(V ) \u2265 (1 \u2212 \u0001) degavg(B).\n\nA feasible solution to k-MCV is also feasible for kavg-MCV, but not the vice versa. We will refer to\nthe cost of an algorithm also as its sample complexity, by regarding a probe operation as \u201csampling\u201d\nthe edge probed. For any deterministic algorithm, the adversary can force the algorithm to always\nprobe \u2126(mn) edges. Hence, we only consider randomized algorithms.\nk-MCV can be reduced to k-AS. Given a hidden bipartite graph (B, W, E), we can treat every\nblack vertex b \u2208 B as an \u201carm\u201d associated with a Bernoulli reward distribution: the reward is 1 with\nprobability deg(b)/m (recall m = |W|), and 0 with probability 1\u2212 deg(b)/m. Any algorithm A for\nk-AS can be deployed to solve k-MCV as follows. Whenever A samples from arm b, we randomly\nchoose a white vertex w \u2208 W , and probe the edge between b and w. A reward of 1 is returned to A\nif and only if the edge exists.\n\n2\n\n\f\u0001(cid:48)2 log k\n\n1\n\n\u03b8k(B) log n\n\nk-AS and k-MCV differ, however, in the size of the population that a reward distribution is de\ufb01ned\non. For k-AS, the reward of each arm is sampled from a population of an inde\ufb01nite size, which can\neven be in\ufb01nite. Consequently, k-AS nicely models situations such as the crowdsourcing application\nmentioned earlier.\nFor k-MCV, the reward distribution of each \u201carm\u201d (i.e., a black vertex b) is de\ufb01ned on a population\nof size m = |W| (i.e., the edges of b). This has three implications. First, k-MCV is a better\nmodeling of applications like top-k semi-join (where an edge exists between b \u2208 B and w \u2208 W\nif and only if Q(b, w) is true). Second, the problem admits an obvious algorithm with cost O(nm)\n(recall n = |B|): simply probe all the hidden edges. Third, an algorithm never needs to probe the\nsame edge between b and w twice\u2014once probed, whether the edge is solid or empty is perpetually\nrevealed. We refer to the last implication as the history-awareness property.\nThe above discussion on k-AS and k-MCV also applies to kavg-AS and kavg-MCV. For each of\nabove problems, we refer to an algorithm which achieves the precision and failure requirements\nprescribed by \u0001 and \u03b4 as an (\u0001, \u03b4)-approximate algorithm.\n1.2 Previous Results\nProblem 1. Sheng et al. [14] presented an algorithm1 that solves k-AS with expected cost\n\u03b4 ). No lower bound is known on the sample complexity of k-AS. The closest work\nO( n\n\u00012\nis due to Kalyanakrishnan et al. [11]. They considered the EXPLORE-k problem, where the goal\nis to return a set V of k arms such that, with probability at least 1 \u2212 \u03b4, the mean of each arm in\nV is at least \u03b8k(B) \u2212 \u0001(cid:48). They showed an algorithm with sample complexity \u0398( n\n\u03b4 ) in ex-\npectation and establish a matching lower bound. Note that EXPLORE-k ensures an absolute-error\nguarantee, which is weaker than the individually relative-error guarantee of k-AS. Therefore, the\nsame EXPLORE-k lower bound also applies to k-AS.\nThe readers may be tempted to set \u0001(cid:48) = \u0001 \u00b7 \u03b8k(B) to derive a \u201clower bound\u201d of \u2126( n\n(\u03b8k(B))2 log k\n\u03b4 )\nfor k-AS. This, however, is clearly wrong because when \u03b8k(B) = o(1) (a typical case in practice)\nthis \u201clower bound\u201d may be even higher than the upper bound of [14] mentioned earlier. The cause\nof the error lies in that the hard instance constructed in [11] requires \u03b8k(B) = \u2126(1).\nProblem 2. The O( n\n\u03b4 ) upper bound of [14] on k-AS carries over to kavg-AS (which, as\n\u00012\nmentioned before, can be solved by any k-AS algorithm). Zhou et al. [16] considered an OPTMAI\nproblem whose goal is to \ufb01nd a k-sized subset V such that \u03b8avg(V ) \u2212 \u03b8avg(B) \u2264 \u0001(cid:48) holds with\nprobability at least 1\u2212 \u03b4. Note, once again, that this is an absolute-error guarantee, as opposed to the\nrelative-error guarantee of kavg-AS. For OPTMAI, Zhou et al. presented an algorithm with sample\n)) in expectation. Observe that if \u03b8avg(B) is available magically in\ncomplexity O( n\nadvance, we can immediately apply the OPTMAI algorithm of [16] to settle kavg-AS by setting\n\u0001(cid:48) = \u0001 \u00b7 \u03b8avg(B). The expected cost of the algorithm becomes O( n\n)) (which\nis suboptimal. See the table).\nNo lower bound is known on the sample complexity of kavg-AS. For OPTMAI, Zhou et al. [16]\nproved a lower bound of \u2126( n\n)), which directly applies to kavg-AS due to its stronger\nquality guarantee.\nProblems 3 and 4. Both problems can be trivially solved with cost O(nm). Furthermore, as\nexplained in Section 1.1, k-MCV and kavg-MCV can be reduced to k-AS and kavg-AS respectively.\nIndeed, the best existing k-AS and kavg-AS algorithms (surveyed in the above) serve as the state of\nthe art for k-MCV and kavg-MCV, respectively.\nPrior to this work, no lower bound results were known for k-MCV and kavg-MCV. Note that none\nof the lower bounds for k-AS (or kavg-AS) is applicable to k-MCV (or kavg-MCV, resp.), because\nthere is no reduction from the former problem to the latter.\n1.3 Our Results\nWe obtain tight upper and lower bounds for all of the problems de\ufb01ned in Section 1.1. Our main re-\nsults are summarized in Table 1 (all bounds are in expectation). Next, we explain several highlights,\nand provide an overview into our techniques.\n\n(\u03b8avg(B))2 (1 + log(1/\u03b4)\n\n\u0001(cid:48)2 (1 + log(1/\u03b4)\n\n\u0001(cid:48)2 (1 + log(1/\u03b4)\n\nk\n\n1\n\n\u03b8k(B) log n\n\n1\n\n\u00012\n\nk\n\n1\n\n\u00012\n\nk\n\n1The algorithm was designed for k-MCV, but it can be adapted to k-AS as well.\n\n3\n\n\fsample complexity\nO\n\n1\n\n(cid:16) n\n(cid:16) n\n\u2126(cid:0) n\n(cid:16) n\n\n\u00012\n\n\u00012\n\nO\n\n(cid:17)\n(cid:17)\n(cid:17)\n\n\u03b4\n\n\u03b4\n\n1\n\n(cid:1)\n\n\u03b8k(B) log n\n\u03b8k(B) log k\n\u00012 log k\n\u03b8k(B) log k\n\u03b4\n\u03b8k(B) log n\n\u03b4 )\n\n1\n\n1\n\n\u03b4\n\n\u03b8avg(B)\n\n(\u03b8avg(B))2\n\n\u03b8avg(B)\n\nk\n1 + log(1/\u03b4)\n\nO\n\nO\n\n(cid:16)\n\n\u2126\n\u00012\nO( n\n\u00012\n\n1\n\n1\n\n\u00012\n1\n\n\u00012\n\n\u00012\n\n\u00012\n\n\u2126\n\n\u2126\n\nO\n\nmin\n\n(cid:16) n\n(cid:16) n\n(cid:16) n\n(cid:16) n\n(cid:110) n\n(cid:110) n\n(cid:110) n\n(cid:16)\n\n(cid:16)\n(cid:16)\n(cid:16)\n(cid:110) n\n(cid:110) n\n\nmin\n\n\u00012\n\n\u00012\n\n\u00012\n\n\u00012\n\nm\n\nm\n\nm2\n\nmin\n\nO\ndegk(B) log k\n\n\u00012\n\nm\n\n\u03b4\n\nk\n\nk\n\n(cid:17)(cid:17)\n\n1 + log(1/\u03b4)\n\n1 + log(1/\u03b4)\n\n1 + log(1/\u03b4)\n\n(cid:16)\n(cid:16)\n(cid:16)\n\n(cid:17)(cid:17)\n(cid:17)(cid:17)\n(cid:17)(cid:17)\n(cid:111)(cid:17)\n(cid:111)(cid:17)\nif degk(B) \u2265 \u2126( 1\n(cid:111)(cid:17)\n(cid:17)\n(cid:16)\n(cid:16)\n(cid:17)\n\n\u03b4 , nm\n1 + log(1/\u03b4)\n\nk\n\u03b4 , nm\n\u03b4 , nm\n\nm\n\n1 + log(1/\u03b4)\n\nk\n\n(cid:17)(cid:17)\n\nk\n\nm\n\ndegk(B) log n\ndegk(B) log k\n\nm\n\n(cid:17)\n\ndegk(B) log n\n\ndegavg(B)\n\n1 + log(1/\u03b4)\n\nk\n\n(degavg(B))2\n\n(cid:16) n\n(cid:16)\n(cid:16)\n(cid:16) n\n\nO\n\nmin\n\nO\n\nmin\n\n\u00012\n\n\u2126\n\u2126(nm) if degk(B) < O( 1\n\u0001 )\n\n\u00012 log n\n\u03b4 )\n\n(cid:111)(cid:17)\n(cid:111)(cid:17)\n\n, nm\n\n, nm\n\n(cid:40)\n\nO\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3 \u2126\n\nsource\n[14]\nnew\n[11]\nnew\n[14]\n\n[16]\n\nnew\n[16]\nnew\n[14]\nnew\n\nnew\n\n[14]\n\n[16]\n\nnew\n\nnew\n\nTable 1: Comparison of our and previous results (all bounds are in expectation)\nproblem\n\nk-AS\n\nkavg-AS\n\nk-MCV\n\nkavg-MCV\n\nupper\nbound\n\nlower\nbound\n\nupper\nbound\n\nlower\nbound\n\nupper\nbound\n\nlower\nbound\n\nupper\nbound\n\nlower\nbound\n\n\u00012\n\ndegavg(B)\n\nif degavg(B) \u2265 \u2126( 1\n\n\u00012 log n\n\u03b4 )\n\n\u2126(nm) if degavg(B) < O( 1\n\u0001 )\n\nk-AS. Our algorithm improves the log n factor of [14] to log k (in practice k (cid:28) n), thereby achiev-\ning the optimal sample complexity (Theorem 1).\nOur analysis for k-AS is inspired by [8, 10, 11] (in particular the median elimination technique in\n[8]). However, the details are very different and more involved than the previous ones (the applica-\ntion of median elimination of [8] was in a much simpler context where the analysis was considerably\neasier). On the lower bound side, our argument is similar to that of [11], but we need to get rid of\nthe \u03b8k(B) = \u2126(1) assumption (as explained in Section 1.2), which requires several changes in the\nanalysis (Theorem 2).\nkavg-AS. Our algorithm improves both existing solutions in [14, 16] signi\ufb01cantly, noticing that both\n\u03b8k(B) and (\u03b8avg(B))2 are never larger, but can be far smaller, than \u03b8avg(B). This improvement re-\nsults from an enhanced version of median elimination, and once again, requires a non-trivial analysis\nspeci\ufb01c to our context (Theorem 4). Our lower bound is established with a novel reduction from the\n1-AS problem (Theorem 5). It is worth nothing that the reduction can be used to simplify the proof\nof the lower bound in [16, Theorem 5.5] .\nk-MCV and kavg-MCV. The stated upper bounds for k-MCV and kavg-MCV in Table 1 can be\nobtained directly from our k-AS and kavg-AS algorithms. In contrast, all the lower-bound arguments\nfor k-AS and kavg-AS\u2014which crucially rely on the samples being i.i.d.\u2014break down for the two\nMCV problems, due to the history-awareness property explained in Section 1.1.\nFor k-MCV, we remedy the issue by (i) (when degk(B) is large) a reduction from k-AS, and (ii)\n(when degk(B) is small) a reduction from a sampling lower bound for distinguishing two extremely\nsimilar distributions (Theorem 3). Analogous ideas are deployed for kavg-MCV (Theorem 6). Note\n\u03b4 )), we do not have the\nthat for a small range of degk(B) (i.e., \u2126( 1\noptimal lower bounds yet for k-MCV and kavg-MCV. Closing the gap is left as an interesting open\nproblem.\n\n\u0001 ) < degk(B) < O( 1\n\n\u00012 log n\n\n4\n\n\fAlgorithm 1: ME-AS\n1 input: B, \u0001, \u03b4, k\n2 for \u00b5 = 1/2, 1/4, . . . do\nS = ME(B, \u0001, \u03b4, \u00b5, k);\n3\n{(ai, \u02c6\u03b8US (ai)) | 1 \u2264 i \u2264 k} = US(S, \u0001, \u03b4, (1 \u2212 \u0001/2)\u00b5, k);\n4\nif \u02c6\u03b8US (ak) \u2265 2\u00b5 then\n5\n6\n\nreturn {a1, . . . , ak};\n\nsample every arm a \u2208 S(cid:96) for Q(cid:96) = (12/\u00012\nfor each arm a \u2208 S(cid:96) do\n\nAlgorithm 2: Median Elimination (ME)\n1 input: B, \u0001, \u03b4, \u00b5, k\n2 S1 = B, \u00011 = \u0001/16, \u03b41 = \u03b4/8, \u00b51 = \u00b5, and (cid:96) = 1;\n3 while |S(cid:96)| > 4k do\n4\n5\n6\n7\n8\n9\n10 return S(cid:96);\n\n(cid:96) )(1/\u00b5(cid:96)) log(6k/\u03b4(cid:96)) times;\n\nits empirical value \u02c6\u03b8(a) = the average of the Q(cid:96) samples from a;\n\na1, . . . , a|S(cid:96)| = the arms sorted in non-increasing order of their empirical values;\nS(cid:96)+1 = {a1, . . . , a|S(cid:96)|/2};\n\u0001(cid:96)+1 = 3\u0001(cid:96)/4, \u03b4(cid:96)+1 = \u03b4(cid:96)/2, \u00b5(cid:96)+1 = (1 \u2212 \u0001(cid:96))\u00b5(cid:96), and (cid:96) = (cid:96) + 1;\n\nits US-empirical value \u02c6\u03b8US (a) = the average of the Q samples from a;\n\nAlgorithm 3: Uniform Sampling (US)\n1 input: S, \u0001, \u03b4, \u00b5s, k\n2 sample every arm a \u2208 S for Q = (96/\u00012)(1/\u00b5s) log(4|S|/\u03b4) times;\n3 for each arm a \u2208 S do\n4\n5 a1, . . . , a|S| = the arms sorted in non-increasing order of their US-empirical values;\n6 return {(a1, \u02c6\u03b8US (a1)), . . . , (ak, \u02c6\u03b8US (ak))}\n2 Top-k Arm Selection\nIn this section, we describe a new algorithm for the k-AS problem. We present the detailed analysis\nin Appendix B.\nOur k-AS algorithm consists of three components: ME-AS, Median Elimination (ME), and Uniform\nSampling (US), as shown in Algorithms 1, 2, and 3, respectively.\nGiven parameters B, \u0001, \u03b4, k (as in Problem 1), ME-AS takes a \u201cguess\u201d \u00b5 (Line 2) on the value of\n\u03b8k(B), and then applies ME (Line 3) to prune B down to a set S of at most 4k arms. Then, at Line\n4, US is invoked to process S. At Line 5, (as will be clear shortly) the value of \u02c6\u03b8US (ak) is what\nME-AS thinks should be the value of \u03b8k(B); thus, the algorithm performs a quality check to see\nwhether \u02c6\u03b8US (ak) is larger than but close to \u00b5. If the check fails, ME-AS halves its guess \u00b5 (Line 2),\nand repeats the above steps; otherwise, the output of US from Line 4 is returned as the \ufb01nal result.\nME runs in rounds. Round (cid:96) (= 1, 2, ...) is controlled by parameters S(cid:96), \u0001(cid:96), \u03b4(cid:96), and \u00b5(cid:96) (their values\nfor Round 1 are given at Line 1). In general, S(cid:96) is the set of arms from which we still want to sample.\nFor each arm a \u2208 S(cid:96), ME takes Q(cid:96) (Line 4) samples from a, and calculates its empirical value \u02c6\u03b8(a)\n(Lines 5 and 6). ME drops (at Lines 7 and 8) half of the arms in S(cid:96) with the smallest empirical\nvalues, and then (at Line 9) sets the parameters of the next round. ME terminates by returning S(cid:96) as\nsoon as |S(cid:96)| is at most 4k (Lines 3 and 10).\nUS simply takes Q samples from each arm a \u2208 S (Line 2), and calculates its US-empirical value\n\u02c6\u03b8US (a) (Lines 3 and 4). Finally, US returns the k arms in S with the largest US-empirical values\n(Lines 5 and 6).\nRemark. If we ignore Line 3 of Algorithm 1 and simply set S = B, then ME-AS degenerates into\nthe algorithm in [14].\n\n5\n\n\f(cid:16) n\n\n(cid:17)\n\n4\n\n48\n\nTheorem 1 ME-AS solves the k-AS problem with expected cost O\nWe extends the proof in [11] and establish the lower bound for k-AS as shown in Theorem 2.\n\n(cid:1), given any algorithm, there is an instance of the\n\nTheorem 2 For any \u0001 \u2208 (cid:0)0, 1\n\n(cid:1) and \u03b4 \u2208 (cid:0)0, 1\n\n\u03b8k(B) log k\n\n\u00012\n\n.\n\n1\n\n\u03b4\n\n1\n\n\u03b8k(B) log k\n\nk-AS problem on which the algorithm must entail \u2126( n\n\u00012\n3 k-MOST CONNECTED VERTEX\nThis section is devoted to the k-MCV problem (Problem 3). We will focus on lower bounds because\nour k-AS algorithm in the previous section also settles k-MCV with the cost claimed in Table 1 by\napplying the reduction described in Section 1.1. We establish matching lower bounds below:\n\n\u03b4 ) cost in expectation.\n\n(cid:1) and \u03b4 \u2208 (cid:0)0, 1\n\n(cid:1), the following statements are true about any\n(cid:1), there is an instance on which the algorithm must probe\n\nTheorem 3 For any \u0001 \u2208 (cid:0)0, 1\n\u2022 when degk(B) \u2265 \u2126(cid:0) 1\n\nk-MCV algorithm:\n\n12\n\n48\n\n\u00012 log n\n\n\u03b4\n\n\u2126( n\n\u00012\n\nm\n\ndegk(B) log k\n\n\u03b4 ) edges in expectation.\n\n\u2022 when degk(B) < O( 1\nedges in expectation.\n\n\u0001 ), there is an instance on which the algorithm must probe \u2126(nm)\n\nFor large degk(B) in Theorem 3, we utilize an instance for k-AS to construct a random hidden\nbipartite graph and fed it to any algorithm solves k-MCV. By doing this, we reduce k-AS to k-\nMCV and thus, establish our \ufb01rst lower bound.\nFor small degk(B), we de\ufb01ne the single-vertex problem where the goal is to distinguish two ex-\ntremely distributions. We prove the lower bound of single-vertex problem and reduce it to k-MCV.\nThus, we establish our second lower bound. The details are presented in Appendix D.\n4 Top-kavg Arm Selection\nOur kavg-AS algorithm QE-AS is similar to ME-AS described in Section 2, except that the parame-\nters are adjusted appropriately, as shown in Algorithm 4, 5, 6 respectively. We present the details in\nAppendix E.\nTheorem 4 QE-AS solves the kavg-AS problem with expected cost O\nWe establish the lower bound for kavg-AS as shown in Theorem 5.\n\n1 + log(1/\u03b4)\n\n(cid:16) n\n\n(cid:16)\n\n\u00012\n\nk\n\n(cid:1) and \u03b4 \u2208 (cid:0)0, 1\n\n1\n\n\u03b8avg(B)\n\n(cid:17)(cid:17)\n(cid:1), given any (\u0001, \u03b4)-approximate algo-\n(cid:17)\n\n(cid:16) n\n\nlog(1/\u03b4)\n\n.\n\n1\n\nand\n\n\u00012\n\n\u03b8avg(B)\n\nk\n\nTheorem 5 For any \u0001 \u2208 (cid:0)0, 1\n(cid:16) n\n(cid:16) n\n\nthere is an instance of\n1\n\nWe show that\n\nrithm,\n\u2126\n\n(cid:16)\n(cid:17)\n\n1 + log(1/\u03b4)\n\n(cid:17)(cid:17)\n\n\u03b8avg(B)\n\n\u00012\n\nk\n\n12\nthe kavg-AS problem on which the algorithm must entail\n\n48\n\ncost in expectation.\n\nthe lower bound of kavg-AS is the maximum of \u2126\n\n1\n\n\u00012\n\n\u03b8avg(B)\n\n. Our proof of the \ufb01rst lower bound is based on a novel reduction from 1-AS. We\n\u2126\nstress that our reduction can be used to simplify the proof of the lower bound in [16, Theorem 5.5].\n5 kavg-MOST CONNECTED VERTEX\nOur kavg-AS algorithm, combined with the reduction described in Section 1.1, already settles kavg-\nMCV with the sample complexity given in Table 1. We establish the following lower bound and\nprove it in Appendix F.\n\n(cid:1), the following statements are true about any\n(cid:1) and \u03b4 \u2208 (cid:0)0, 1\n(cid:1), there is an instance on which the algorithm must probe\n(cid:18)\n(cid:18) n\n\n(cid:19)(cid:19)\n\nlog(1/\u03b4)\n\n\u00012 log n\n\nm\n\n48\n\n\u03b4\n\nTheorem 6 For any \u0001 \u2208 (cid:0)0, 1\n\u2022 when degavg(B) \u2265 \u2126(cid:0) 1\n\nkavg-MCV algorithm:\n\n12\n\n\u2126\n\n\u00012\n\ndegavg(B)\n\n1 +\n\nk\n\nedges in expectation.\n\u2022 when degk(B) < O( 1\nedges in expectation.\n\n\u0001 ), there is an instance on which the algorithm must probe \u2126(nm)\n\n6\n\n\fAlgorithm 4: QE-AS\n1 input: B, \u0001, \u03b4, k\n2 for \u00b5 = 1/2, 1/4, . . . do\nS = QE(B, \u0001, \u03b4, \u00b5, k);\n3\n{(ai | 1 \u2264 i \u2264 k), \u02c6\u03b8U S\n4\nif \u02c6\u03b8U S\n5\n6\n\navg \u2265 2\u00b5 then\nreturn {a1, . . . , ak};\n\navg} = US(S, \u0001, \u03b4, (1 \u2212 \u0001/2)\u00b5, k);\n\nsample every arm a \u2208 S(cid:96) for Q(cid:96) = (48/\u00012\nfor each arm a \u2208 S(cid:96) do\n\nAlgorithm 5: Quartile Elimination (QE)\n1 input: B, \u0001, \u03b4, \u00b5, k\n2 S1 = B, \u00011 = \u0001/32, \u03b41 = \u03b4/8, \u00b51 = \u00b5, and (cid:96) = 1;\n3 while |S(cid:96)| > 4k do\n4\n5\n6\n7\n8\n9\n10 return S(cid:96);\n\n(cid:16)\n\n1 + log(2/\u03b4(cid:96))\n\nk\n\n(cid:17)\n\ntimes;\n\n(cid:96) )(1/\u00b5(cid:96))\n\nits empirical value \u02c6\u03b8(a) = the average of the Q(cid:96) samples from a;\n\na1, . . . , a|S(cid:96)| = the arms sorted in non-increasing order of their empirical values;\nS(cid:96)+1 = {a1, . . . , a3|S(cid:96)|/4};\n\u0001(cid:96)+1 = 7\u0001(cid:96)/8, \u03b4(cid:96)+1 = \u03b4(cid:96)/2, \u00b5(cid:96)+1 = (1 \u2212 \u0001(cid:96))\u00b5(cid:96), and (cid:96) = (cid:96) + 1;\n\n(cid:16)\n\n(cid:17)\n\ntimes;\n\ni=1\n\n1 + log(4/\u03b4)\n\nk\n\n(cid:80)k\n\n\u02c6\u03b8US (ai)}\n\nits US-empirical value \u02c6\u03b8US (a) = the average of the Q samples from a;\n\nAlgorithm 6: Uniform Sampling (US)\n1 input: S, \u0001, \u03b4, \u00b5s, k\n2 sample every arm a \u2208 S for Q = (120/\u00012)(1/\u00b5s)\n3 for each arm a \u2208 S do\n4\n5 a1, . . . , a|S| = the arms sorted in non-increasing order of their US-empirical values;\n6 return {(a1, . . . , ak), \u02c6\u03b8U S\navg = 1\nk\n6 Experiment Evaluation\nDue to the space constraint, we show only the experiments that compare ME-AS and AMCV [14] for\nk-MCV problem. Additional experiments can be found in Appendix G. We use two synthetic data\nsets and one real world data set to evaluate the algorithms. Each dataset is represented as a bipartite\ngraph with n = m = 5000. For the synthetic data, the degrees of the black vertices follow a power\nlaw distribution. For each black vertex b \u2208 B, its degree equals d with probability c(d + 1)\u2212\u03c4 where\n\u03c4 is the parameter to be set and c is the normalizing factor. Furthermore, for each black vertex with\ndegree d, we connected it to d randomly selected white vertices. Thus, we build two bipartite graphs\nby setting the proper parameters in order to control the average degrees of the black vertices to be\n50 and 3000 respectively. For the real world data, we crawl 5000 active users from twitter with their\ncorresponding relationships. We construct a bipartite graph G = (B, W, E) where each of B and\nW represents all the users and E represents the 2-hop relationships. We say two users b \u2208 B and\nw \u2208 W have a 2-hop relationship if they share at least one common friend.\nAs the theoretical analysis is rather pessimistic due to the extensive usage of the union bound, to\nmake a fair comparison, we adopt the same strategy as in [14], i.e., to divide the sample cost in\ntheory by a heuristic constant \u03be. We use the same parameter \u03be = 2000 for AMCV as in [14].\nFor ME-AS, we \ufb01rst take \u03be = 107 for each round of the median elimination step and then we use\nthe previous sample cost dividing 250 as the samples of the uniform sampling step. Notice that it\ndoes not con\ufb02ict the theoretical sample complexity since the median elimination step dominates the\nsample complexity of the algorithm.\nWe \ufb01x the parameters \u03b4 = 0.1, k = 20 and enumerate \u0001 from 0.01 to 0.1. We then calculate\nthe actual failure probability by counting the successful runs in 100 repeats. Recall that due to\nthe heuristic nature, the algorithm may not achieve the theoretical guarantees prescribed by (\u0001, \u03b4).\n\n7\n\n\fWhenever this happens, we label the percentage of actual error \u0001a it achieves according to the failure\nprobability \u03b4. For example 2.9 means the algorithm actually achieves an error \u0001a = 0.029 with\nfailure probability \u03b4. The experiment result is shown in Fig 1.\n\n(a) Power law with \u00afdeg = 50\n\n(b) Power law with \u00afdeg = 3000\n\n(c) 2-hop\n\nFigure 1: Performance comparison for k-MCV vs. \u0001\n\nAs we can see, ME-AS outperforms AMCV in both sample complexity and the actual error in all\ndata sets. We stress that in the worst case, it seems ME-AS only shows a difference when n (cid:29) k.\nHowever for the most of the real world data, the degrees of the vertices usually follow a power\nlaw distribution or a Gaussian distribution. For such cases, our algorithm only needs to take a\nfew samples in each round of the elimination step and drops half of vertices with high con\ufb01dence.\nTherefore, the experimental result shows that the sample cost of ME-AS is much less than AMCV.\n7 Related Work\nMulti-armed bandit problems are classical decision problems with exploration-exploitation trade-\noffs, and have been extensively studied for several decades (dating back to 1930s). In this line of\nresearch, k-AS and kavg-AS \ufb01t into the pure exploration category, which has attracted signi\ufb01cant\nattentions in recent years due to its abundant applications such as online advertisement placemen-\nt [6], channel allocation for mobile communications [2], crowdsourcing [16], etc. We mention some\nclosely related work below, and refer the interested readers to a recent survey [4].\nEven-Dar et al. [8] proposed an optimal algorithm for selecting a single arm which approximates\nthe best arm with an additive error at most \u0001 (a matching lower bound was established by Mannor et\nal. [12]). Kalyanakrishnan et al. [10, 11] considered the EXPLORE-k problem which we mentioned\n\u03b4 ). Similarly,\nin Section 1.2. They provided an algorithm with the sample complexity O( n\nZhou et al. [16] studied the OPTMAI problem which, again as mentioned in Section 1.2, is the\nabsolute-error version of kavg-AS.\nAudibert et al. [2] and Bubeck et al. [4] investigated the \ufb01xed budget setting where, given a \ufb01xed\nnumber of samples, we want to minimize the so-called misidenti\ufb01cation probability (informally, the\nprobability that the solution is not optimal). Buckeck et al. [5] also showed the links between the\nsimple regret (the gap between the arm we obtain and the best arm) and the cumulative regret (the\ngap between the reward we obtained and the expected reward of the best arm). Gabillon et al.\n[9]\nprovide a uni\ufb01ed approach UGapE for EXPLORE-k in both the \ufb01xed budget and the \ufb01xed con\ufb01dence\nsettings. They derived the algorithms based on \u201clower and upper con\ufb01dence bound\u201d (LUCB) where\nthe time complexity depends on the gap between \u03b8k(B) and the other arms . Note that each time\nLUCB samples the two arms that are most dif\ufb01cult to distinguish. Since our problem ensures an\nindividually guarantee, it is unclear whether only sampling the most dif\ufb01cult-to-distinguish arms\nwould be enough. We leave it as an intriguing direction for future work. Chen et al. [6] studied how\nto select the best arms under various combinatorial constraints.\nAcknowledgements.\nJian Li, Wei Cao, Zhize Li were supported in part by the National Basic\nResearch Program of China grants 2015CB358700, 2011CBA00300, 2011CBA00301, and the Na-\ntional NSFC grants 61202009, 61033001, 61361136003. Yufei Tao was supported in part by projects\nGRF 4168/13 and GRF 142072/14 from HKRGC.\n\n\u00012 log k\n\n8\n\n0.010.030.050.070.09\u0001105106107108samplecost2.95.68.511.016.319.218.926.228.011.012.113.915.5AMCVME-AS0.010.030.050.070.09\u0001105106107108samplecost11.8AMCVME-AS0.010.030.050.070.09\u0001105106107108samplecost3.79.312.116.318.022.723.127.629.55.77.48.69.911.212.8AMCVME-AS\fReferences\n[1] Y. Amsterdamer, S. B. Davidson, T. Milo, S. Novgorodov, and A. Somech. OASSIS: query\n\ndriven crowd mining. In SIGMOD, pages 589\u2013600, 2014.\n\n[2] J.-Y. Audibert, S. Bubeck, et al. Best arm identi\ufb01cation in multi-armed bandits. COLT, 2010.\n[3] Z. Bar-Yossef. The complexity of massive data set computations. PhD thesis, University of\n\nCalifornia, 2002.\n\n[4] S. Bubeck, N. Cesa-Bianchi, et al. Regret analysis of stochastic and nonstochastic multi-armed\n\nbandit problems. Foundations and trends in machine learning, 5(1):1\u2013122, 2012.\n\n[5] S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in \ufb01nitely-armed and continuous-armed\n\nbandits. Theoretical Computer Science, 412(19):1832\u20131852, 2011.\n\n[6] S. Chen, T. Lin, I. King, M. R. Lyu, and W. Chen. Combinatorial pure exploration of multi-\narmed bandits. In Advances in Neural Information Processing Systems, pages 379\u2013387, 2014.\n[7] D. P. Dubhashi and A. Panconesi. Concentration of measure for the analysis of randomized\n\nalgorithms. Cambridge University Press, 2009.\n\n[8] E. Even-Dar, S. Mannor, and Y. Mansour. Action elimination and stopping conditions for the\nmulti-armed bandit and reinforcement learning problems. The Journal of Machine Learning\nResearch, 7:1079\u20131105, 2006.\n\n[9] V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identi\ufb01cation: A uni\ufb01ed approach\nto \ufb01xed budget and \ufb01xed con\ufb01dence. In Advances in Neural Information Processing Systems,\npages 3212\u20133220, 2012.\n\n[10] S. Kalyanakrishnan and P. Stone. Ef\ufb01cient selection of multiple bandit arms: Theory and\n\npractice. In ICML, pages 511\u2013518, 2010.\n\n[11] S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone. PAC subset selection in stochastic multi-\n\narmed bandits. In ICML, pages 655\u2013662, 2012.\n\n[12] S. Mannor and J. N. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit\n\nproblem. The Journal of Machine Learning Research, 5:623\u2013648, 2004.\n\n[13] A. G. Parameswaran, S. Boyd, H. Garcia-Molina, A. Gupta, N. Polyzotis, and J. Widom.\n\nOptimal crowd-powered rating and \ufb01ltering algorithms. PVLDB, 7(9):685\u2013696, 2014.\n\n[14] C. Sheng, Y. Tao, and J. Li. Exact and approximate algorithms for the most connected vertex\n\nproblem. TODS, 37(2):12, 2012.\n\n[15] J. Wang, E. Lo, and M. L. Yiu. Identifying the most connected vertices in hidden bipartite\n\ngraphs using group testing. TKDE, 25(10):2245\u20132256, 2013.\n\n[16] Y. Zhou, X. Chen, and J. Li. Optimal PAC multiple arm identi\ufb01cation with applications to\n\ncrowdsourcing. In ICML, pages 217\u2013225, 2014.\n\n[17] M. Zhu, D. Papadias, J. Zhang, and D. L. Lee. Top-k spatial joins. TKDE, 17(4):567\u2013579,\n\n2005.\n\n9\n\n\f", "award": [], "sourceid": 654, "authors": [{"given_name": "Wei", "family_name": "Cao", "institution": "Tsinghua University"}, {"given_name": "Jian", "family_name": "Li", "institution": "Tsinghua University"}, {"given_name": "Yufei", "family_name": "Tao", "institution": "CUHK"}, {"given_name": "Zhize", "family_name": "Li", "institution": "Tsinghua University"}]}