{"title": "Online Submodular Set Cover, Ranking, and Repeated Active Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 1107, "page_last": 1115, "abstract": "We propose an online prediction version of submodular set cover with connections to ranking and repeated active learning. In each round, the learning algorithm chooses a sequence of items. The algorithm then receives a monotone submodular function and suffers loss equal to the cover time of the function: the number of items needed, when items are selected in order of the chosen sequence, to achieve a coverage constraint. We develop an online learning algorithm whose loss converges to approximately that of the best sequence in hindsight. Our proposed algorithm is readily extended to a setting where multiple functions are revealed at each round and to bandit and contextual bandit settings.", "full_text": "Online Submodular Set Cover,\n\nRanking, and Repeated Active Learning\n\nAndrew Guillory\n\nDepartment of Computer Science\n\nUniversity of Washington\n\nguillory@cs.washington.edu\n\nbilmes@ee.washington.edu\n\nJeff Bilmes\n\nDepartment of Electrical Engineering\n\nUniversity of Washington\n\nAbstract\n\nWe propose an online prediction version of submodular set cover with connections\nto ranking and repeated active learning. In each round, the learning algorithm\nchooses a sequence of items. The algorithm then receives a monotone submodu-\nlar function and suffers loss equal to the cover time of the function: the number of\nitems needed, when items are selected in order of the chosen sequence, to achieve\na coverage constraint. We develop an online learning algorithm whose loss con-\nverges to approximately that of the best sequence in hindsight. Our proposed\nalgorithm is readily extended to a setting where multiple functions are revealed at\neach round and to bandit and contextual bandit settings.\n\n1 Problem\n\nIn an online ranking problem, at each round we choose an ordered list of items and then incur some\nloss. Problems with this structure include search result ranking, ranking news articles, and ranking\nadvertisements. In search result ranking, each round corresponds to a search query and the items\ncorrespond to search results. We consider online ranking problems in which the loss incurred at\neach round is the number of items in the list needed to achieve some goal. For example, in search\nresult ranking a reasonable loss is the number of results the user needs to view before they \ufb01nd the\ncomplete information they need. We are speci\ufb01cally interested in problems where the list of items is\na sequence of questions to ask or tests to perform in order to learn. In this case the ranking problem\nbecomes a repeated active learning problem. For example, consider a medical diagnosis problem\nwhere at each round we choose a sequence of medical tests to perform on a patient with an unknown\nillness. The loss is the number of tests we need to perform in order to make a con\ufb01dent diagnosis.\nWe propose an approach to these problems using a new online version of submodular set cover.\nA set function F (S) de\ufb01ned over a ground set V is called submodular if it satis\ufb01es the following\ndiminishing returns property: for every A \u2286 B \u2286 V \\{v}, F (A + v)\u2212 F (A) \u2265 F (B + v)\u2212 F (B).\nMany natural objectives measuring information, in\ufb02uence, and coverage turn out to be submodular\n[1, 2, 3]. A set function is called monotone if for every A \u2286 B, F (A) \u2264 F (B) and normalized if\nF (\u2205) = 0. Submodular set cover is the problem of selecting an S \u2286 V minimizing |S| under the\nconstraint that F (S) \u2265 1 where F is submodular, monotone, and normalized (note we can always\nrescale F ). This problem is NP-hard, but a greedy algorithm gives a solution with cost less than\n1 + ln 1/\u0001 that of the optimal solution where \u0001 is the smallest non-zero gain of F [4].\nWe propose the following online prediction version of submodular set cover, which we simply call\nonline submodular set cover. At each time step t = 1 . . . T we choose a sequence of elements\ni is chosen from a ground set V of size n (we use a superscript\nSt = (vt\nfor rounds of the online problem and a subscript for other indices). After choosing St, an adversary\nreveals a submodular, monotone, normalized function F t, and we suffer loss (cid:96)(F t, St) where\n\nn) where each vt\n\n1, vt\n\n2, . . . vt\n\n(cid:96)(F t, St) (cid:44) min(cid:0){n} \u222a {i : F t(St\n\n(1)\n\n(cid:1)\n\ni ) \u2265 1}i\n\n1\n\n\f0\n\n1, F t\n\n(cid:44)(cid:83)\n\nand St\ni\n\nj\u2264i{vt\n\n1), F t(St\n\ni=0 I(F t(St\n\n2), . . . F t(St\n\nj} is de\ufb01ned to be the set containing the \ufb01rst i elements of St (let St\n\n(cid:96) can be equivalently written (cid:96)(F t, St) (cid:44)(cid:80)n\noblivious adversary). The goal of our learning algorithm is to minimize the total loss(cid:80)\n\n(cid:44) \u2205). Note\ni ) < 1) where I is the indicator function.\nIntuitively, (cid:96)(F t, St) corresponds to a bounded version of cover time: it is the number of items up to\nn needed to achieve F t(S) \u2265 1 when we select items in the order speci\ufb01ed by St. Thus, if coverage\nis not achieved, we suffer a loss of n. We assume that F t(V ) \u2265 1 (therefore coverage is achieved if\nSt does not contain duplicates) and that the sequence of functions (F t)t is chosen in advance (by an\nt (cid:96)(F t, St).\nTo make the problem clear, we present it \ufb01rst in its simplest, full information version. However, we\nwill later consider more complex variations including (1) a version where we only produce a list of\nlength k \u2264 n instead of n, (2) a multiple objective version where a set of objectives F t\n2, . . . F t\nm\nis revealed each round, (3) a bandit (partial information) version where we do not get full access to\nF t and instead only observe F t(St\nn), and (4) a contextual bandit version where\nthere is some context associated with each round.\nWe argue that online submodular set cover, as we have de\ufb01ned it, is an interesting and useful model\nfor ranking and repeated active learning problems. In a search result ranking problem, after present-\ning search results to a user we can obtain implicit feedback from this user (e.g., clicks, time spent\nviewing each result) to determine which results were actually relevant. We can then construct an\nobjective F t(S) such that F t(S) \u2265 1 iff S covers or summarizes the relevant results. Alternatively,\nwe can avoid explicitly constructing an objective by considering the bandit version of the problem\nwhere we only observe the values F t(St\ni ). For example, if the user clicked on k total results then\ni ) (cid:44) ci/k where ci \u2264 k is the number of results in the subset Si which were clicked.\nwe can let F (St\nNote that the user may click an arbitrary set of results in an arbitrary order, and the user\u2019s decision\nwhether or not to click a result may depend on previously viewed and clicked results. All that we\nassume is that there is some unknown submodular function explaining the click counts. If the user\nclicks on a small number of very early results, then coverage is achieved quickly and the ordering is\ndesirable. This coverage objective makes sense if we assume that the set of results the user clicked\nare of roughly equal importance and together summarize the results of interest to the user.\nIn the medical diagnosis application, we can de\ufb01ne F t(S) to be proportional to the number of\ncandidate diseases which are eliminated after performing the set of tests S on patient t. If we assume\nthat a particular test result always eliminates a \ufb01xed set of candidate diseases, then this function is\nsubmodular. Speci\ufb01cally, this objective is the reduction in the size of the version space [5, 6]. Other\nactive learning problems can also be phrased in terms of satisfying a submodular coverage constraint\nincluding problems that allow for noise [7]. Note that, as in the search result ranking problem, F t is\nnot initially known but can be inferred after we have chosen St and suffered loss (cid:96)(F t, St).\n\n2 Background and Related Work\n\nthe ground set V to minimize(cid:80)m\n\nRecently, Azar and Gamzu [8] extended the O(ln 1/\u0001) greedy approximation algorithm for sub-\nmodular set cover to the more general problem of minimizing the average cover time of a set of\nobjectives. Here \u0001 is the smallest non-zero gain of all the objectives. Azar and Gamzu [8] call this\nproblem ranking with submodular valuations. More formally, we have a known set of functions\nF1, F2, . . . , Fm each with an associated weight wi. The goal is then to choose a permutation S of\ni=1 wi(cid:96)(Fi, S). The of\ufb02ine approximation algorithm for ranking\nwith submodular valuations will be a crucial tool in our analysis of online submodular set cover.\nIn particular, this of\ufb02ine algorithm can viewed as constructing the best single permutation S for a\nsequence of objectives F 1, F 2 . . . F T in hindsight (i.e., after all the objectives are known). Recently\nthe ranking with submodular valuations problem was extended to metric costs [9].\nOnline learning is a well-studied problem [10]. In one standard setting, the online learning algorithm\nhas a collection of actions A, and at each time step t the algorithm picks an action St \u2208 A. The\nlearning algorithm then receives a loss function (cid:96)t, and the algorithm incurs the loss value for the\naction it chose (cid:96)t(St). We assume (cid:96)t(St) \u2208 [0, 1] but make no other assumptions about the form\nof loss. The performance of an online learning algorithm is often measured in terms of regret, the\ndifference between the loss incurred by the algorithm and the loss of the best single \ufb01xed action\nt=1 (cid:96)t(S). There are randomized algorithms which\n\nin hindsight: R = (cid:80)T\nguarantee E[R] \u2264 (cid:112)T ln|A| for adversarial sequences of loss functions [11]. Note that because\n\nt=1 (cid:96)t(St) \u2212 minS\u2208A(cid:80)T\n\n2\n\n\ftimes the best \ufb01xed prediction. R\u03b1 =(cid:80)T\n\nE[R] = o(T ) the per round regret approaches zero. In the bandit version of this problem the learning\nalgorithm only observes (cid:96)t(St) [12].\nOur problem \ufb01ts in this standard setting with A chosen to be the set of all ground set permutations\n(v1, v2, . . . vn) and (cid:96)t(St) (cid:44) (cid:96)(F t, St)/n. However, in this case A is very large so standard online\nlearning algorithms which keep weight vectors of size |A| cannot be directly applied. Furthermore,\nour problem generalizes an NP-hard of\ufb02ine problem which has no polynomial time approximation\nscheme, so it is not likely that we will be able to derive any ef\ufb01cient algorithm with o(T ln|A|)\nregret. We therefore instead consider \u03b1-regret, the loss incurred by the algorithm as compared to \u03b1\nt=1 (cid:96)t(S). \u03b1-regret is a standard\nnotion of regret for online versions of NP-hard problems. If we can show R\u03b1 grows sub linearly\nwith T then we have shown loss converges to that of an of\ufb02ine approximation with ratio \u03b1.\nStreeter and Golovin [13] give online algorithms for the closely related problems of submodular\nfunction maximization and min-sum submodular set cover. In online submodular function maxi-\nmization, the learning algorithm selects a set St with |St| \u2264 k before F t is revealed, and the goal is\nt F t(St). This problem differs from ours in that our problem is a loss minimization\nproblem as opposed to an objective maximization problem. Online min-sum submodular set cover\nis similar to online submodular set cover except the loss is not cover time but rather\n\nt=1 (cid:96)t(St)\u2212 \u03b1 minS\u2208A(cid:80)T\n\nto maximize(cid:80)\n\ni ), 0).\n\nmax(1 \u2212 F t(St\n\ni ) where submodular set cover uses I(F t(St\n\n\u02c6(cid:96)(F t, St) (cid:44) n(cid:88)\nrepeated active learning problems minimizing(cid:80)\nthe number of questions asked. Minimizing(cid:80)\nmizes(cid:80)\n\nMin-sum submodular set cover penalizes 1 \u2212 F t(St\ni ) <\n1). We claim that for certain applications the hard threshold makes more sense. For example, in\nt (cid:96)(F t, St) naturally corresponds to minimizing\n\u02c6(cid:96)(F t, St) does not have this interpretation as it\ncharges less for questions asked when F t is closer to 1. One might hope that minimizing (cid:96) could\nbe reduced to or shown equivalent to minimizing \u02c6(cid:96). This is not likely to be the case, as the ap-\nproximation algorithm of Streeter and Golovin [13] does not carry over to online submodular set\ncover. Their online algorithm is based on approximating an of\ufb02ine algorithm which greedily maxi-\nt min(F t(S), 1). Azar and Gamzu [8] show that this of\ufb02ine algorithm, which they call the\ncumulative greedy algorithm, does not achieve a good approximation ratio for average cover time.\nRadlinski et al. [14] consider a special case of online submodular function maximization applied\nto search result ranking. In their problem the objective function is assumed to be a binary valued\nsubmodular function with 1 indicating the user clicked on at least one document. The goal is then\nto maximize the number of queries which receive at least one click. For binary valued functions\n\u02c6(cid:96) and (cid:96) are the same, so in this setting minimizing the number of documents a user must view\nbefore clicking on a result is a min-sum submodular set cover problem. Our results generalize this\nproblem to minimizing the number of documents a user must view before some possibly non-binary\nsubmodular objective is met. With non-binary objectives we can incorporate richer implicit feedback\nsuch as multiple clicks and time spent viewing results. Slivkins et al. [15] generalize the results of\nRadlinski et al. [14] to a metric space bandit setting.\nOur work differs from the online set cover problem of Alon et al. [16]; this problem is a single\nset cover problem in which the items that need to be covered are revealed one at a time. Kakade\net al. [17] analyze general online optimization problems with linear loss. If we assume that the\nfunctions F t are all taken from a known \ufb01nite set of functions F then we have linear loss over a |F|\ndimensional space. However, this approach gives poor dependence on |F|.\n\n(2)\n\ni=0\n\nt\n\n3 Of\ufb02ine Analysis\n\nIn this work we present an algorithm for online submodular set cover which extends the of\ufb02ine\nalgorithm of Azar and Gamzu [8] for the ranking with submodular valuations problem. Algorithm 1\nshows this of\ufb02ine algorithm, called the adaptive residual updates algorithm. Here we use T to denote\nthe number of objective functions and superscript t to index the set of objectives. This notation is\nchosen to make the connection to the proceeding online algorithm clear: our online algorithm will\napproximately implement Algorithm 1 in an online setting, and in this case the set of objectives in\n\n3\n\n\fAlgorithm 1 Of\ufb02ine Adaptive Residual\nInput: Objectives F 1, F 2, . . . F T\nOutput: Sequence S1 \u2282 S2 \u2282 . . . Sn\n\n(cid:80)\n\nS0 \u2190 \u2205\nfor i \u2190 1 . . . n do\n\nv \u2190 argmax\nSi \u2190 Si\u22121 + v\n\nv\u2208V\n\nend for\n\nt \u03b4(F t, Si\u22121, v)\n\nFigure 1: Histograms used in of\ufb02ine analysis\n\nthe of\ufb02ine algorithm will be the sequence of objectives in the online problem. The algorithm is a\ngreedy algorithm similar to the standard algorithm for submodular set cover. The crucial difference\nis that instead of a normal gain term of F t(S + v) \u2212 F t(S) it uses a relative gain term\n\n(cid:40)\n\n\u03b4(F t, S, v) (cid:44)\n\nmin( F t(S+v)\u2212F t(S)\n0\n\n1\u2212F t(S)\n\n, 1)\n\nif F (S) < 1\notherwise\n\nThe intuition is that (1) a small gain for F t matters more if F t is close to being covered (F t(S) close\nto 1) and (2) gains for F t with F t(S) \u2265 1 do not matter as these functions are already covered. The\nmain result of Azar and Gamzu [8] is that Algorithm 1 is approximately optimal.\n\nTheorem 1 ([8]). The loss (cid:80)\n\nt (cid:96)(F t, S) of the sequence produced by Algorithm 1 is within\n\n4(ln(1/\u0001) + 2) of that of any other sequence.\n\nWe note Azar and Gamzu [8] allow for weights for each F t. We omit weights for simplicity. Also,\nAzar and Gamzu [8] do not allow the sequence S to contain duplicates while we do\u2013selecting a\nground set element twice has no bene\ufb01t but allowing them will be convenient for the online analy-\nsis. The proof of Theorem 1 involves representing solutions to the submodular ranking problem as\nhistograms. Each histogram is de\ufb01ned such that the area of the histogram is equal to the loss of the\ncorresponding solution. The approximate optimality of Algorithm 1 is shown by proving that the\nhistogram for the solution it \ufb01nds is approximately contained within the histogram for the optimal\nsolution. In order to convert Algorithm 1 into an online algorithm, we will need a stronger version of\nTheorem 1. Speci\ufb01cally, we will need to show that when there is some additive error in the greedy\nselection rule Algorithm 1 is still approximately optimal.\nFor the optimal solution S\u2217 = argminS\u2208V n\nt (cid:96)(F t, S) (V n is the set of all length n sequences of\nground set elements), de\ufb01ne a histogram h\u2217 with T columns, one for each function F t. Let the tth\ncolumn have with width 1 and height equal to (cid:96)(F t, S\u2217). Assume that the columns are ordered by\nincreasing cover time so that the histogram is monotone non-decreasing. Note that the area of this\nhistogram is exactly the loss of S\u2217.\nFor a sequence of sets \u2205 = S0 \u2286 S1 \u2286 . . . Sn (e.g., those found by Algorithm 1) de\ufb01ne a corre-\nsponding sequence of truncated objectives\n\n(cid:80)\n\n(cid:40)\n\ni (S) (cid:44)\n\u02c6F t\n\nmin( F t(S\u222aSi\u22121)\u2212F t(Si\u22121)\n1\n\n1\u2212F t(Si\u22121)\n\n, 1)\n\nif F t(Si\u22121) < 1\notherwise\n\ni is an easier objective than F t. Also, for any v, \u02c6F t\n\ni is submodular and that if F t(S) \u2265 1 then \u02c6F t\n\n\u02c6F t\ni (S) is essentially F t except with (1) Si\u22121 given \u201cfor free\u201d, and (2) values rescaled to range\ni (S) \u2265 1. In this\nbetween 0 and 1. We note that \u02c6F t\ni (\u2205) = \u03b4(F t, Si\u22121, v). In\nsense \u02c6F t\ni at \u2205 is the normalized gain of F t at Si\u22121. This property will be crucial.\nother words, the gain of \u02c6F t\nWe next de\ufb01ne truncated versions of h\u2217: \u02c6h1, \u02c6h2, . . . \u02c6hn which correspond to the loss of S\u2217 for the\ni . For each j \u2208 1 . . . n, let \u02c6hi have T columns of height j\neasier covering problems involving \u02c6F t\nj ) \u2212 \u02c6F t\ni (S\u2217\nwith the tth such column of width \u02c6F t\nj\u22121) (some of these columns may have 0 width).\nAssume again the columns are ordered by height. Figure 1 shows h\u2217 and \u02c6hi.\nWe assume without loss of generality that F t(S\u2217\ncontains no duplicates, so under our assumption that F t(V ) \u2265 1 we also have F t(S\u2217\n\nn) \u2265 1 for every t (clearly some choice of S\u2217\nn) \u2265 1). Note\n\ni ({v}) \u2212 \u02c6F t\n\ni (S\u2217\n\n4\n\n\f(cid:80)\n\nt\n\n\u02c6(cid:96)( \u02c6F t\n\nthat the total width of \u02c6hi is then the number of functions remaining to be covered after Si\u22121 is given\nfor free (i.e., the number of F t with F t(Si\u22121) < 1). It is not hard to see that the total area of \u02c6hi is\ni , S\u2217) where \u02c6l is the loss function for min-sum submodular set cover (2). From this we know\n\n\u02c6hi has area less than h\u2217. In fact, Azar and Gamzu [8] show the following.\nLemma 1 ([8]). \u02c6hi is completely contained within h\u2217 when \u02c6hi and h\u2217 are aligned along their lower\nright boundaries.\n\nQi =(cid:80)\n(cid:80)n\nWe need one \ufb01nal lemma before proving the main result of this section. For a sequence S de\ufb01ne\nt \u03b4(F t, Si\u22121, vi) to be the total normalized gain of the ith selected element and let \u2206i =\nj=i Qj be the sum of the normalized gains from i to n. De\ufb01ne \u03a0i = |{t : F t(Si\u22121) < 1}| to be\nthe number of functions which are still uncovered before vi is selected (i.e., the loss incurred at step\ni). [8] show the following result relating \u2206i to \u03a0i.\nLemma 2 ([8]). For any i, \u2206i \u2264 (ln 1/\u0001 + 2)\u03a0i\nWe now state and prove the main result of this section, that Algorithm 1 is approximately optimal\neven when the ith greedy selection is preformed with some additive error Ri. This theorem shows\nthat in order to achieve low average cover time it suf\ufb01ces to approximately implement Algorithm 1.\nAside from being useful for converting Algorithm 1 into an online algorithm, this theorem may be\nuseful for applications in which the ground set V is very large. In these situations it may be possible\nto approximate Algorithm 1 (e.g., through sampling). Streeter and Golovin [13] prove similar results\nfor submodular function maximization and min-sum submodular set cover. Our result is similar, but\nthe proof is non trivial. The loss function (cid:96) is highly non linear with respect to changes in F t(St\ni ),\nso it is conceivable that small additive errors in the greedy selection could have a large effect. The\nanalysis of Im and Nagarajan [9] involves a version of Algorithm 1 which is robust to a sort of\nmultplicative error in each stage of the greedy selection.\nTheorem 2. Let S = (v1, v2, . . . vn) be any sequence for which\n\nt\n\n\u03b4(F t, Si\u22121, v)\n\n\u03b4(F t, Si\u22121, vi) + Ri \u2265 max\nv\u2208V\n\nt (cid:96)(F t, St) \u2264 4(ln 1/\u0001 + 2)(cid:80)\n\nThen(cid:80)\nith column have width (Qi + Ri)/(2\u03b3) and height max(\u03a0i \u2212(cid:80)\n(cid:88)\n\nProof. Let h be a histogram with a column for each \u03a0i with \u03a0i (cid:54)= 0. Let \u03b3 = (ln 1/\u0001 + 2). Let the\nj Rj, 0)/(2(Qi + Ri)). Note that\n\u03a0i (cid:54)= 0 iff Qi + Ri (cid:54)= 0 as if there are functions not yet covered then there is some set element with\nnon zero gain (and vice versa). The area of h is\n\nmax(\u03a0i \u2212(cid:80)\n\n(cid:88)\n\n(cid:88)\n\nj Rj, 0)\n\n(cid:88)\nt (cid:96)(F t, S\u2217) + n(cid:80)\n\nt\ni Ri\n\n(cid:88)\n\n1\n2\u03b3\n\ni:\u03a0i(cid:54)=0\n\n(Qi + Ri)\n\n2(Qi + Ri)\n\n\u2265 1\n4\u03b3\n\nt\n\n(cid:96)(F t, S) \u2212 n\n4\u03b3\n\nRj\n\nj\n\nAssume h and every \u02c6hi are aligned along their lower right boundaries. We show that if the ith\ncolumn of h has non-zero area then it is contained within \u02c6hi. Then, it follows from Lemma 1 that h\nis contained within h\u2217, completing the proof.\n\nj Rj. This column\nj\u2265i Rj)/(2\u03b3) away from the right hand boundary. To show that this column is in\nj Rj)/(2(Qi + Ri))(cid:99) items in S\u2217 we\n\u02c6F t\ni can increase through\nthe addition of one item is Qi + Ri. Therefore, using the submodularity of \u02c6F t\ni ,\nRj/2\n\nConsider the ith column in h. Assume this column has non zero area so \u03a0i \u2265(cid:80)\nis at most (\u2206i +(cid:80)\n\u02c6hi it suf\ufb01ces to show that after selecting the \ufb01rst k = (cid:98)(\u03a0i \u2212(cid:80)\nstill have(cid:80)\nt(1 \u2212 \u02c6F t\n(cid:88)\nt(1 \u2212 \u02c6F t\ni (S\u2217\n\u03a0i/2 +\n\nj\u2265i Rj)/(2\u03b3) . The most that(cid:80)\ni (\u2205) \u2264 k(Qi + Ri) \u2264 \u03a0i/2 \u2212(cid:88)\nj Rj/2 since(cid:80)\n(cid:88)\n(cid:88)\ni (\u2205)) = \u03a0i. Using Lemma 2\n\nk)) \u2265 (\u2206i +(cid:80)\nk) \u2212(cid:88)\ni (S\u2217\n\u02c6F t\nk)) \u2265 \u03a0i/2 +(cid:80)\n(cid:88)\n\nt(1 \u2212 \u02c6F t\nRj/2 \u2265 (\u2206i +\n\nTherefore(cid:80)\n\nRj/2 \u2265 \u2206i/(2\u03b3) +\n\nRj)/(2\u03b3)\n\ni (S\u2217\n\n\u02c6F t\n\nj\n\nt\n\nt\n\nt\n\nj\u2265i\n\nj\n\nj\n\n5\n\n\fAlgorithm 2 Online Adaptive Residual\nInput: Integer T\n\nInitialize n online learning algorithms\nE1, E2, . . . En with A = V\nfor t = 1 \u2192 T do\n\n\u2200i \u2208 1 . . . n predict vt\nSt \u2190 (vt\nReceive F t, pay loss l(F t, St)\nFor Ei, (cid:96)t(v) \u2190 (1 \u2212 \u03b4(F t, St\n\ni with Ei\n\n1, . . . vt\nn)\n\nend for\n\ni\u22121, v))\n\nFigure 2: Ei selects the ith element in St.\n\n4 Online Analysis\n\nWe now show how to convert Algorithm 1 into an online algorithm. We use the same idea used by\nStreeter and Golovin [13] and Radlinski et al. [14] for online submodular function maximization: we\nrun n copies of some low regret online learning algorithm, E1, E2, . . . En, each with action space\nA = V . We use the ith copy Ei to select the ith item in each predicted sequence St. In other\nwords, the predictions of Ei will be v1\ni . Figure 2 illustrates this. Our algorithm assigns\nloss values to each Ei so that, assuming Ei has low regret, Ei approximately implements the ith\ngreedy selection in Algorithm 1. Algorithm 2 shows this approach. Note that under our assumption\nthat F 1, F 2, . . . F T is chosen by an oblivious adversary, the loss values for the ith copy of the online\nalgorithm are oblivious to the predictions of that run of the algorithm. Therefore we can use any\nalgorithm for learning against an oblivious adversary.\nE[R] \u2264 \u221a\nTheorem 3. Assume we use as a subroutine an online prediction algorithm with expected regret\nT ln n for \u03b1 = 4(ln(1/\u0001) + 2)\n\nT ln n. Algorithm 2 has expected \u03b1-regret E[R\u03b1] \u2264 n2\n\ni , . . . vT\n\ni , v2\n\n\u221a\n\nProof. De\ufb01ne a meta-action \u02dcvi for the sequence of actions chosen by Ei, \u02dcvi = (v1\ni ). We\ni , . . . vT\ni , v2\ni}). Let \u02dcS be\ncan extend the domain of F t to allow for meta-actions F t(S \u222a {\u02c6vi}) = F t(S \u222a {vt\nthe sequence of meta actions \u02dcS = ( \u02dcv1, \u02dcv2, . . . \u02dcvn). Let Ri be the regret of Ei. Note that from the\nde\ufb01nition of regret and our choice of loss values we have that\n\nTherefore, \u02dcS approximates the greedy solution in the sense required by Theorem 2. Theorem 2 did\nnot require that S be constructed V . From Theorem 2 we then have\n\nmax\nv\u2208V\n\n(cid:88)\n\n\u03b4(F t, \u02dcSi\u22121, v) \u2212(cid:88)\n(cid:88)\nThe expected \u03b1-regret is then E[n(cid:80)\ni Ri] \u2264 n2\n\n(cid:96)(F t, \u02dcS) \u2264 \u03b1\n\n(cid:96)(F t, St) =\n\n(cid:88)\n\n\u221a\n\nt\n\nt\n\nT ln n\n\n(cid:88)\n\n\u03b4(F t, \u02dcSi\u22121, \u02dcvi) = Ri\n\n(cid:88)\n\n(cid:96)(F t, S\u2217) + n\n\nRi\n\nt\n\nt\n\nt\n\ni\n\nWe describe several variations and extensions of this analysis, some of which mirror those for related\nwork [13, 14, 15].\nAvoiding Duplicate Items Since each run of the online prediction algorithm is independent, Algo-\nrithm 2 may select the same ground set element multiple times. This drawback is easy to \ufb01x. We\ncan simply select any arbitrary vi /\u2208 Si\u22121 if Ei selects a vi \u2208 Si\u2212i. This modi\ufb01cation does not affect\nthe regret guarantee as selecting a vi \u2208 Si\u22121 will always result in a gain of zero (loss of 1).\nTruncated Loss In some applications we only care about the \ufb01rst k items in the sequence St. For\nthese applications it makes sense to consider a truncated version of l(F t, St) with parameter k\n\n(cid:80)\nt (cid:96)k(F t, St) \u2264 4(ln 1/\u0001 + 2)(cid:80)\n\nThis is cover time computed up to the kth element in St. The analysis for Theorem 2 also shows\ni=1 Ri. The corresponding regret bound is then\n\n(cid:96)k(F t, St) (cid:44) min(cid:0){k} \u222a {|St\nt (cid:96)(F t, S\u2217) + k(cid:80)k\n\ni ) \u2265 1}(cid:1)\n\ni| : F t(St\n\n6\n\n\fk2\n\nT ln n. Note here we are bounding truncated loss(cid:80)\n\n(cid:80)\n\u221a\nt (cid:96)k(F t, St) in terms of untruncated loss\nt (cid:96)(F t, S\u2217). In this sense this bound is weaker. However, we replace n2 with k2 which may be\nand incur loss(cid:80)m\n(1/m)(cid:80)m\ntotal regret where L\u2217 =(cid:80)T\n\nmuch smaller. Algorithm 2 achieves this bound simultaneously for all k.\nMultiple Objectives per Round Consider a variation of online submodular set cover in which in-\nstead of receiving a single objective F t each round we receive a batch of objectives F t\n2, . . . F t\nm\ni , St). In other words, each rounds corresponds to a ranking with sub-\nIt is easy to extend Algorithm 2 to this setting by using 1 \u2212\nmL\u2217 ln n+k2m ln n)\n\ni\u22121, v) for the loss of action v in Ei. We then get O(k2\n\nmodular valuations problem.\n\ni , S\u2217) (Section 2.6 of [10]).\n\n(cid:80)m\n\ni=1 \u03b4(F t\n\ni=1 (cid:96)(F t\n\ni=1 (cid:96)(F t\n\n1, F t\n\ni , St\n\n\u221a\n\nt=1\n\ni (St\n\n1), F t(St\n\n2), . . . F t(St\n\nBandit Setting Consider a setting where instead of receiving full access to F t we only observe\nthe sequence of objective function values F t(St\nn) (or in the case of multiple\nj) for every i and j). We can extend Algorithm 2 to this setting using a\nobjectives per round, F t\nnonstochastic multiarmed bandits algorithm [12]. We note duplicate removal becomes more subtle\nin the bandit setting: should we feedback a gain of zero when a duplicate is selected or the gain of\nthe non-duplicate replacement? We propose either is valid if replacements are chosen obliviously.\nBandit Setting with Expert Advice We can further generalize the bandit setting to the contextual\nbandit setting [18] (e.g., the bandit setting with expert advice [12]). Say that we have access at time\nstep t to predictions from a set of m experts. Let \u02dcvj be the meta action corresponding to the sequence\nof predictions from the jth expert and \u02dcV be the set of all \u02dcvj. Assume that Ei guarantees low regret\n\nwith respect to \u02dcV (cid:88)\n3. Additionally assume that F t( \u02dcV ) \u2265 1 for every t. In this case we can show(cid:80)\n\nwhere we have extended the domain of each F t to include meta actions as in the proof of Theorem\nt (cid:96)k(F t, St) \u2264\nnT ln m) giving\nnT ln m). Experts may use context in forming recommendations. For example,\n\n(cid:80)\nt (cid:96)m(F t, S\u2217) + k(cid:80)k\n\nminS\u2217\u2208 \u02dcV m\ntotal regret O(k2\nin a search ranking problem the context could be the query.\n\ni=1 Ri. The Exp4 algorithm [12] has Ri = O(\n\ni ) + Ri \u2265 max\n\u02dcv\u2208 \u02dcV\n\n(cid:88)\n\n\u03b4(F t, St\n\n\u03b4(F t, St\n\ni\u22121, vt\n\ni\u22121, \u02dcv)\n\n\u221a\n\nt\n\nt\n\n(3)\n\n\u221a\n\n5 Experimental Results\n\n5.1 Synthetic Example\n\ni ) to be min(ct\n\ni, \u03bb)/\u03bb where ct\n\nWe present a synthetic example for which the online cumulative greedy algorithm [13] fails, based\non the example in Azar and Gamzu [8] for the of\ufb02ine setting. Consider an online ad placement\nproblem where the ground set V is a set of available ad placement actions (e.g., a v \u2208 V could\ncorrespond to placing an ad on a particular web page for a particular length of time). On round\nt, we receive an ad from an advertiser, and our goal is to acquire \u03bb clicks for the ad using as few\nadvertising actions as possible. De\ufb01ne F t(St\ni is number of clicks\nacquired from the ad placement actions St\ni .\nSay that we have n advertising actions of two types: 2 broad actions and n \u2212 2 narrow actions. Say\nthat the ads we receive are also of two types. Common type ads occur with probability (n \u2212 1)/n\nand receive 1 and \u03bb \u2212 1 clicks respectively from the two broad actions and 0 clicks from narrow\nactions. Uncommon type ads occur with probability 1/n and receive \u03bb clicks from one randomly\nchosen narrow action and 0 clicks from all other actions. Assume \u03bb \u2265 n2. Intuitively broad actions\ncould correspond to ad placements on sites for which many ads are relevant. The optimal strategy\ngiving an average cover time O(1) is to \ufb01rst select the two broad actions covering all common ads\nthen select the narrow actions in any order. However, the of\ufb02ine cumulative greedy algorithm will\npick all narrow actions before picking the broad action with gain 1 giving average cover time O(n).\nThe left of Figure 3 shows average cover time for our proposed algorithm and the cumulative greedy\nalgorithm of [13] on the same sequences of random objectives. For this example we use n = 25\nand the bandit version of the problem with the Exp3 algorithm [12]. We also plot the average cover\ntimes for of\ufb02ine solutions as baselines. As seen in the \ufb01gure, the cumulative algorithms converge to\nhigher average cover times than the adaptive residual algorithms. Interestingly, the online cumulative\nalgorithm does better than the of\ufb02ine cumulative algorithm: it seems added randomization helps.\n\n7\n\n\fFigure 3: Average cover time\n\n5.2 Repeated Active Learning for Movie Recommendation\n\nConsider a movie recommendation website which asks users a sequence of questions before they are\ngiven recommendations. We de\ufb01ne an online submodular set cover problem for choosing sequences\nof questions in order to quickly eliminate a large number of movies from consideration. This is\nsimilar conceptually to the diagnosis problem discussed in the introduction. De\ufb01ne the ground set\nV to be a set of questions (for example \u201cDo you want to watch something released in the past\n10 years?\u201d or \u201cDo you want to watch something from the Drama genre?\u201d). De\ufb01ne F t(S) to be\nproportional to the number of movies eliminated from consideration after asking the tth user S.\nSpeci\ufb01cally, let H be the set of all movies in our database and V t(S) be the subset of movies\nconsistent with the tth user\u2019s responses to S. De\ufb01ne F t(S) (cid:44) min(|H \\ V t(S)|/c, 1) where c is a\nconstant. F t(S) \u2265 iff after asking the set of questions S we have eliminated at least c movies.\nWe set H to be a set of 11634 movies available on Net\ufb02ix\u2019s Watch Instantly service and use 803\nquestions based on those we used for an of\ufb02ine problem [7]. To simulate user responses to questions,\non round t we randomly select a movie from H and assume the tth user answers questions consis-\ntently with this movie. We set c = |H| \u2212 500 so the goal is to eliminate about 95% of all movies.\nWe evaluate in the full information setting: this makes sense if we assume we receive as feedback\nthe movie the user actually selected. As our online prediction subroutine we tried Normal-Hedge\n[19], a second order multiplicative weights method [20], and a version of multiplicative weights for\nsmall gains using the doubling trick (Section 2.6 of [10]). We also tried a heuristic modi\ufb01cation of\nNormal-Hedge which \ufb01xes ct = 1 for a \ufb01xed, more aggressive learning rate than theoretically justi-\n\ufb01ed. The right of Figure 3 shows average cover time for 100 runs of T = 10000 iterations. Note the\ndifferent scale in the bottom row\u2013these methods performed signi\ufb01cantly worse than Normal-Hedge.\nThe online cumulative greedy algorithm converges to a average cover time close to but slightly\nworse than that of the adaptive greedy method. The differences are more dramatic for prediction\nsubroutines that converge slowly. The modi\ufb01ed Normal-Hedge has no theoretical justi\ufb01cation, so it\nmay not generalize to other problems. For the modi\ufb01ed Normal-Hedge the \ufb01nal average cover times\nare 7.72 adaptive and 8.22 cumulative. The of\ufb02ine values are 6.78 and 7.15.\n\n6 Open Problems\n\nIt is not yet clear what practical value our proposed approach will have for web search result ranking.\nA drawback to our approach is that we pick a \ufb01xed order in which to ask questions. For some\nproblems it makes more sense to consider adaptive strategies [5, 6].\n\nAcknowledgments\n\nThis material is based upon work supported in part by the National Science Foundation under grant\nIIS-0535100, by an Intel research award, a Microsoft research award, and a Google research award.\n\n8\n\n\fReferences\n[1] H. Lin and J. Bilmes. A class of submodular functions for document summarization. In HLT,\n\n2011.\n\n[2] D. Kempe, J. Kleinberg, and \u00b4E. Tardos. Maximizing the spread of in\ufb02uence through a social\n\nnetwork. In KDD, 2003.\n\n[3] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in Gaussian processes:\n\nTheory, ef\ufb01cient algorithms and empirical studies. JMLR, 2008.\n\n[4] L.A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem.\n\nCombinatorica, 2(4), 1982.\n\n[5] D. Golovin and A. Krause. Adaptive submodularity: A new approach to active learning and\n\nstochastic optimization. In COLT, 2010.\n\n[6] Andrew Guillory and Jeff Bilmes. Interactive submodular set cover. In ICML, 2010.\n\n[7] Andrew Guillory and Jeff Bilmes. Simultaneous learning and covering with adversarial noise.\n\nIn ICML, 2011.\n\n[8] Yossi Azar and Iftah Gamzu. Ranking with Submodular Valuations. In SODA, 2011.\n\n[9] S. Im and V. Nagarajan. Minimum Latency Submodular Cover in Metrics. ArXiv e-prints,\n\nOctober 2011.\n\n[10] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press,\n\n2006.\n\n[11] Y. Freund and R. Schapire. A desicion-theoretic generalization of on-line learning and an\n\napplication to boosting. In Computational learning theory, pages 23\u201337, 1995.\n\n[12] P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire. The nonstochastic multiarmed bandit\n\nproblem. SIAM Journal on Computing, 32(1):48\u201377, 2003.\n\n[13] M. Streeter and D. Golovin. An online algorithm for maximizing submodular functions. In\n\nNIPS, 2008.\n\n[14] F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed\n\nbandits. In ICML, 2008.\n\n[15] A. Slivkins, F. Radlinski, and S. Gollapudi. Learning optimally diverse rankings over large\n\ndocument collections. In ICML, 2010.\n\n[16] N. Alon, B. Awerbuch, and Y. Azar. The online set cover problem. In STOC, 2003.\n\n[17] Sham M. Kakade, Adam Tauman Kalai, and Katrina Ligett. Playing games with approximation\n\nalgorithms. In STOC, 2007.\n\n[18] J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In\n\nNIPS, 2007.\n\n[19] K. Chaudhuri, Y. Freund, and D. Hsu. A parameter-free hedging algorithm. In NIPS, 2009.\n\n[20] N. Cesa-Bianchi, Y. Mansour, and G. Stoltz.\nwith expert advice. Machine Learning, 2007.\n\nImproved second-order bounds for prediction\n\n9\n\n\f", "award": [], "sourceid": 663, "authors": [{"given_name": "Andrew", "family_name": "Guillory", "institution": null}, {"given_name": "Jeff", "family_name": "Bilmes", "institution": null}]}