{"title": "Monotone k-Submodular Function Maximization with Size Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 694, "page_last": 702, "abstract": "A $k$-submodular function is a generalization of a submodular function, where the input consists of $k$ disjoint subsets, instead of a single subset, of the domain.Many machine learning problems, including influence maximization with $k$ kinds of topics and sensor placement with $k$ kinds of sensors, can be naturally modeled as the problem of maximizing monotone $k$-submodular functions.In this paper, we give constant-factor approximation algorithms for maximizing monotone $k$-submodular functions subject to several size constraints.The running time of our algorithms are almost linear in the domain size.We experimentally demonstrate that our algorithms outperform baseline algorithms in terms of the solution quality.", "full_text": "Monotone k-Submodular Function Maximization\n\nwith Size Constraints\n\nNaoto Ohsaka\n\nThe University of Tokyo\n\nohsaka@is.s.u-tokyo.ac.jp\n\nYuichi Yoshida\n\nNational Institute of Informatics, and\n\nPreferred Infrastructure, Inc.\nyyoshida@nii.ac.jp\n\nAbstract\n\nA k-submodular function is a generalization of a submodular function, where the\ninput consists of k disjoint subsets, instead of a single subset, of the domain.\nMany machine learning problems, including in\ufb02uence maximization with k kinds\nof topics and sensor placement with k kinds of sensors, can be naturally modeled\nas the problem of maximizing monotone k-submodular functions. In this paper,\nwe give constant-factor approximation algorithms for maximizing monotone k-\nsubmodular functions subject to several size constraints. The running time of our\nalgorithms are almost linear in the domain size. We experimentally demonstrate\nthat our algorithms outperform baseline algorithms in terms of the solution quality.\n\n1\n\nIntroduction\n\nThe task of selecting a set of items subject to constraints on the size or the cost of the set is versatile\nin machine learning problems. The objective can be often modeled as maximizing a function with\nthe diminishing return property, where for a \ufb01nite set V , a function f : 2V \u2192 R satis\ufb01es the\ndiminishing return property if\n\nf (S \u222a {e}) \u2212 f (S) \u2265 f (T \u222a {e}) \u2212 f (T )\n\nfor any S \u2286 T and e \u2208 V \\ T . For example, sensor placement [13, 14], in\ufb02uence maximization\nin social networks [11], document summarization [15], and feature selection [12] involve objectives\nsatisfying the diminishing return property. It is well known that the diminishing return property is\nequivalent to submodularity, where a function f : 2V \u2192 R is submodular if\n\nf (S) + f (T ) \u2265 f (S \u2229 T ) + f (S \u222a T )\n\nholds for any S, T \u2286 V . When the objective function is submodular and hence satis\ufb01es the dimin-\nishing return property, we can \ufb01nd in polynomial time a solution with a provable guarantee on its\nsolution quality even with various constraints [2, 3, 18, 21].\nIn many practical applications, however, we want to select several disjoint sets of items instead of a\nsingle set. To see this, let us describe two examples:\nIn\ufb02uence maximization: Viral marketing is a cost-effective marketing strategy that promotes prod-\nucts by giving free (or discounted) items to a selected group of highly in\ufb02uential people in the hope\nthat, through the word-of-mouth effects, a large number of product adoptions will occur [4, 19].\nSuppose that we have k kinds of items, each having a different topic and thus a different word-of-\nmouth effect. Then, we want to distribute these items to B people selected from a group V of n\npeople so as to maximize the (expected) number of product adoptions. It is natural to impose a con-\nstraint that each person can receive at most one item since giving many free items to one particular\nperson would be unfair.\nSensor placement: There are k kinds of sensors for different measures such as temperature,\nhumidity, and illuminance. Suppose that we have Bi many sensors of the i-th kind for each\n\n1\n\n\fi \u2208 {1, 2, . . . , k}, and there is a set V of n locations, each of which can be instrumented with\nexactly one sensor. Then, we want to allocate those sensors so as to maximize the information gain.\nWhen k = 1, these problems can be modeled as maximizing monotone submodular functions [11,\n14] and admit polynomial-time (1 \u2212 1/e)-approximation [18]. Unfortunately, however, the case of\ngeneral k cannot be modeled as maximizing submodular functions, and we cannot apply the methods\nin the literature on maximizing submodular functions [2, 3, 18, 21]. We note that the problem of\nselecting k disjoint sets can be sometimes modeled as maximizing monotone submodular functions\nover the extended domain k \u00d7 V subject to a partition matroid. Although (1 \u2212 1/e)-approximation\nalgorithms are known [3, 5], the running time is around O(k8n8) and is prohibitively slow.\n\nOur contributions: To address the problem of selecting k disjoint sets, we use the fact that the\nobjectives can be often modeled as k-submodular functions. Let (k + 1)V := {(X1, . . . , Xk) |\nXi \u2286 V \u2200i \u2208 {1, 2, . . . , k}, Xi \u2229 Xj = \u2205 \u2200i (cid:54)= j} be the family of k disjoint sets. Then, a function\nf : (k + 1)V \u2192 R is called k-submodular [9] if, for any x = (X1, . . . , Xk) and y = (Y1, . . . , Yk)\nin (k + 1)V , we have\n\nf (x) + f (y) \u2265 f (x (cid:116) y) + f (x (cid:117) y)\n\nwhere\n\nx (cid:117) y := (X1 \u2229 Y1, . . . , Xk \u2229 Yk),\nx (cid:116) y :=\nXi \u222a Yi\n\n(cid:16)\nX1 \u222a Y1 \\(cid:0)(cid:91)\n\n(cid:1), . . . , Xk \u222a Yk \\(cid:0)(cid:91)\n\n(cid:1)(cid:17)\n\n.\n\nXi \u222a Yi\n\ni(cid:54)=1\n\ni(cid:54)=k\n\nRoughly speaking, k-submodularity captures the property that, if we choose exactly one set Xe \u2208\n{X1, . . . , Xk} that an element e can belong to for each e \u2208 V , then the resulting function is sub-\nmodular (see Section 2 for details). When k = 1, k-submodularity coincides with submodularity.\nIn this paper, we give approximation algorithms for maximizing non-negative monotone k-\nsubmodular functions with several constraints on the sizes of the k sets. Here, we say that f is\nmonotone if f (x) \u2264 f (y) for any x = (X1, . . . , Xk) and y = (Y1, . . . , Yk) with Xi \u2286 Yi for each\ni \u2208 {1, . . . , k}. Let n = |V | be the size of the domain. For the total size constraint, under which\nthe total size of the k sets is bounded by B \u2208 Z+, we show that a simple greedy algorithm outputs\n1/2-approximation in O(knB) time. The approximation ratio of 1/2 is asymptotically tight since\nthe lower bound of k+1\n2k + \u0001 for any \u0001 > 0 is known even when B = n [10]. Combining the random\nsampling technique [17], we also give a randomized algorithm that outputs 1/2-approximation with\nprobability at least 1 \u2212 \u03b4 in O(kn log B log(B/\u03b4)) time. Hence, even when B is as large as n, the\nrunning time is almost linear in n. For the individual size constraint, under which the size of the i-th\nset is bounded by Bi \u2208 Z+ for each i \u2208 {1, . . . , k}, we give a 1/3-approximation algorithm with\ni=1 Bi. We then give a randomized algorithm that outputs\n1/3-approximation with probability at least 1 \u2212 \u03b4 in O(k2n log(B/k) log(B/\u03b4)) time.\nTo show the practicality of our algorithms, we apply them to the in\ufb02uence maximization problem\nand the sensor placement problem, and we demonstrate that they outperform previous methods based\non submodular function maximization and several baseline methods in terms of the solution quality.\n\nrunning time O(knB), where B = (cid:80)k\n\nRelated work: When k = 2, k-submodularity is called bisubmodularity, and [20] applied bisub-\nmodular functions to machine learning problems. However, their algorithms do not have any ap-\nproximation guarantee. Huber and Kolmogorov introduced k-submodularity as a generalization of\nsubmodularity and bisubmodularity [9], and minimizing k-submodular functions was successfully\nused in a computer vision application [8]. Iwata et al. [10] gave a 1/2-approximation algorithm\n2k\u22121-approximation algorithm for maximizing non-monotone and monotone k-submodular\nand a\nfunctions, respectively, when there is no constraint.\n\nk\n\nOrganization: The rest of this paper is organized as follows. In Section 2, we review properties\nof k-submodular functions. Sections 3 and 4 are devoted to show 1/2-approximation algorithms\nfor the total size constraint, and 1/3-approximation algorithms for the individual size constraint,\nrespectively. We show our experimental results in Section 5. We conclude our paper in Section 6.\n\n2\n\n\fAlgorithm 1 k-Greedy-TS\nInput: a monotone k-submodular function f : (k + 1)V \u2192 R+ and an integer B \u2208 Z+.\nOutput: a vector s with |supp(s)| = B.\n1: s \u2190 0.\n2: for j = 1 to B do\n3:\n4:\n5: return s.\n\n(e, i) \u2190 arg maxe\u2208V \\supp(s),i\u2208[k] \u2206e,if (s).\ns(e) \u2190 i.\n\n2 Preliminaries\nFor an integer k \u2208 N, [k] denotes the set {1, 2, . . . , k}. We de\ufb01ne a partial order (cid:22) on (k + 1)V so\nthat, for x = (X1, . . . , Xk) and y = (Y1, . . . , Yk) in (k + 1)V , x (cid:22) y if Xi \u2286 Yi for every i with\ni \u2208 [k]. We also de\ufb01ne\n\n\u2206e,if (x) = f (X1, . . . , Xi\u22121, Xi \u222a {e}, Xi+1, . . . , Xk) \u2212 f (X1, . . . , , Xk)\n\n(cid:96)\u2208[k] X(cid:96), and i \u2208 [k], which is the marginal gain when adding e to the\ni-th set of x. Then, it is easy to see the monotonicity of f is equivalent to \u2206e,if (x) \u2265 0 for any\n(cid:96)\u2208[k] X(cid:96) and i \u2208 [k]. Also it is not hard to show (see [22] for details)\n\nthat the k-submodularity of f implies the orthant submodularity, i.e.,\n\nfor x \u2208 (k + 1)V , e /\u2208 (cid:83)\nx = (X1, . . . , Xk) and e (cid:54)\u2208(cid:83)\nfor any x, y \u2208 (k + 1)V with x (cid:22) y, e /\u2208(cid:83)\nfor any x \u2208 (k + 1)V , e /\u2208(cid:83)\n\n(cid:96)\u2208[k] X(cid:96), and i, j \u2208 [k] with i (cid:54)= j. Actually, the converse holds:\n\nTheorem 2.1 (Ward and \u02c7Zivn\u00b4y [22]). A function f : (k + 1)V \u2192 R is k-submodular if and only if\nf is orthant submodular and pairwise monotone.\nIt is often convenient to identify (k + 1)V with {0, 1 . . . , k}V to analyze k-submodular functions,\nNamely, we associate (X1, . . . , Xk) \u2208 (k + 1)V with x \u2208 {0, 1, . . . , k}V by Xi = {e \u2208 V |\nx(e) = i} for i \u2208 [k]. Hence we sometimes abuse notation, and simply write x = (X1, . . . , Xk)\nby regarding a vector x as disjoint k sets of V . We de\ufb01ne the support of x \u2208 {0, 1, . . . , k}V as\nsupp(x) = {e \u2208 V | x(e) (cid:54)= 0}. Analogously, for x \u2208 {0, 1, . . . , k}V and i \u2208 [k], we de\ufb01ne\nsuppi(x) = {e \u2208 V | x(e) = i}. Let 0 be the zero vector in {0, 1, . . . , k}V .\n\n\u2206e,if (x) \u2265 \u2206e,if (y)\n\n\u2206e,if (x) + \u2206e,jf (x) \u2265 0\n\n(cid:96)\u2208[k] Y(cid:96), and i \u2208 [k], and the pairwise monotonicity, i.e.,\n\n3 Maximizing k-submodular Functions with the Total Size Constraint\n\nIn this section, we give a 1/2-approximation algorithm to the problem of maximizing monotone\nk-submodular functions subject to the total size constraint. Namely, we consider\nsubject to |supp(x)| \u2264 B and x \u2208 (k + 1)V ,\n\nmax f (x)\n\nwhere f : (k + 1)V \u2192 R+ is monotone k-submodular and B \u2208 Z+ is a non-negative integer.\n\n3.1 A greedy algorithm\n\nThe \ufb01rst algorithm we propose is a simple greedy algorithm (Algorithm 1). We show the following:\nTheorem 3.1. Algorithm 1 outputs a 1/2-approximate solution by evaluating f O(knB) times,\nwhere n = |V |.\nThe number of evaluations of f is clearly O(knB). Hence in what follows, we focus on analyzing\nthe approximation ratio of Algorithm 1. Our analysis is based on the framework of [10].\nConsider the j-th iteration of the for loop from Line 2. Let (e(j), i(j)) \u2208 V \u00d7 [k] be the pair greedily\nchosen in this iteration, and let s(j) be the solution after this iteration. We de\ufb01ne s(0) = 0. Let o be\n\n3\n\n\fprobability \u03b4 > 0.\n\nAlgorithm 2 k-Stochastic-Greedy-TS\nInput: a monotone k-submodular function f : (k + 1)V \u2192 R+, an integer B \u2208 Z+, and a failure\nOutput: a vector s with |supp(s)| = B.\n1: s \u2190 0.\n2: for j = 1 to B do\n3: R \u2190 a random subset of size min{ n\u2212j+1\n4:\n5:\n6: return s.\n\n(e, i) \u2190 arg maxe\u2208R,i\u2208[k] \u2206e,if (s).\ns(e) \u2190 i.\n\n\u03b4 , n} uniformly sampled from V \\ supp(s).\n\nB\u2212j+1 log B\n\nthe optimal solution. We iteratively de\ufb01ne o(0) = o, o(1), . . . , o(B) as follows. For each j \u2208 [B], let\nS(j) = supp(o(j\u22121)) \\ supp(s(j\u22121)). Then, we set o(j) = e(j) if e(j) \u2208 S(j), and set o(j) to be an\narbitrary element in S(j) otherwise. Then, we de\ufb01ne o(j\u22121/2) as the resulting vector obtained from\no(j\u22121) by assigning 0 to the o(j)-th element, and then de\ufb01ne o(j) as the resulting vector obtained\nfrom o(j\u22121/2) by assigning i(j) to the e(j)-th element. Note that supp(o(j)) = B holds for every\nj \u2208 {0, 1, . . . , B} and o(B) = s(B) = s. Moreover, we have s(j\u22121) (cid:22) o(j\u22121/2) for every j \u2208 [B].\n\nProof of Theorem 3.1. We \ufb01rst show that, for each j \u2208 [B],\n\nf (s(j)) \u2212 f (s(j\u22121)) \u2265 f (o(j\u22121)) \u2212 f (o(j)).\n\n(1)\nFor each j \u2208 [B], let y(j) = \u2206e(j),i(j)f (s(j\u22121)), a(j\u22121/2) = \u2206o(j),o(j\u22121)(o(j))f (o(j\u22121/2)), and\na(j) = \u2206e(j),i(j) f (o(j\u22121/2)). Then, note that f (s(j))\u2212f (s(j\u22121)) = y(j), and f (o(j\u22121))\u2212f (o(j)) =\na(j\u22121/2) \u2212 a(j). From the monotonicity of f, it suf\ufb01ces to show that y(j) \u2265 a(j\u22121/2). Since e(j) and\ni(j) are chosen greedily, we have y(j) \u2265 \u2206o(j),o(j\u22121)(o(j))f (s(j\u22121)). Since s(j\u22121) (cid:22) o(j\u22121/2), we\nhave \u2206o(j),o(j\u22121)(o(j))f (s(j\u22121)) \u2265 a(j\u22121/2) from the orthant submodularity. Combining these two\ninequalities, we establish (1).\nThen, we have\n\nB(cid:88)\n\n(f (o(j\u22121)) \u2212 f (o(j))) \u2264 B(cid:88)\n\nf (o) \u2212 f (s) =\n\n(f (s(j)) \u2212 f (s(j\u22121))) = f (s) \u2212 f (0) \u2264 f (s),\n\nj=1\n\nj=1\n\nwhich implies f (s) \u2265 f (o)/2.\n\n3.2 An almost linear-time algorithm by random sampling\n\n\u03b4 ),\n\nIn this section, we improve the number of evaluations of f from O(knB) to O(kn log B log B\nwhere \u03b4 > 0 is a failure probability.\nOur algorithm is shown in Algorithm 2. The main difference from Algorithm 1 is that we sample a\nsuf\ufb01ciently large subset R of V , and then greedily assign a value only looking at elements in R.\nWe reuse notations e(j), i(j), S(j) and s(j) from Section 3.1, and let R(j) be R in the j-th iteration.\nWe iteratively de\ufb01ne o(0) = o, o(1), . . . , o(B) as follows. If R(j)\u2229S(j) is empty, then we regard that\nthe algorithm failed. Suppose R(j)\u2229S(j) is non-empty. Then, we set o(j) = e(j) if e(j) \u2208 R(j)\u2229S(j),\nand set o(j) to be an arbitrary element in R(j) \u2229 S(j) otherwise. Finally, we de\ufb01ne o(j\u22121/2) and o(j)\nas in Section 3.1 using o(j\u22121), o(j), and e(j).\nIf the algorithm does not fail and o(1), . . . , o(B) are well de\ufb01ned, or in other words, if R(j) \u2229 S(j) is\nnot empty for every j \u2208 [B], then the rest of the analysis is completely the same as in Section 3.1,\nand we achieve an approximation ratio of 1/2. Hence, it suf\ufb01ces to show that o(1), . . . , o(B) are\nwell de\ufb01ned with a high probability.\nLemma 3.2. With probability at least 1 \u2212 \u03b4, we have R(j) \u2229 S(j) (cid:54)= \u2205 for every j \u2208 [B].\n\n4\n\n\f1: s \u2190 0 and B \u2190(cid:80)\n\nAlgorithm 3 k-Greedy-IS\nInput: a monotone k-submodular function f : (k + 1)V \u2192 R+ and integers B1, . . . , Bk \u2208 Z+.\nOutput: a vector s with |suppi(s)| = Bi for each i \u2208 [k].\n2: for j = 1 to B do\n3:\n4:\n5:\n6: return s.\n\nI \u2190 {i \u2208 [k] | suppi(s) < Bi}.\n(e, i) \u2190 arg maxe\u2208V \\supp(s),i\u2208I \u2206e,if (s).\ns(e) \u2190 i.\n\ni\u2208[k] Bi.\n\n(cid:16)\n\n|S(j)|\n\n(cid:17)|R(j)| \u2264 e\n\nProof. Fix j \u2208 [B]. If |R(j)| = n, then we clealy have Pr[R(j) \u2229 S(j) = \u2205] = 0. Otherwise we have\n\nPr[R(j) \u2229 S(j) = \u2205] =\n\n1 \u2212\n\n|V \\ supp(s(j\u22121))|\nBy the union bound over j \u2208 [B], the lemma follows.\nTheorem 3.3. Algorithm 2 outputs a 1/2-approximate solution with probability at least 1 \u2212 \u03b4 by\nevaluating f at most O(k(n \u2212 B) log B log B\n\n\u03b4 =\n\n.\n\n\u03b4 ) times.\n\n\u2212 B\u2212j+1\nn\u2212j+1\n\nn\u2212j+1\nB\u2212j+1 log B\n\n\u03b4\nB\n\nProof. By Lemma 3.2 and the analysis in Section 3.1, Algorithm 2 outputs a 1/2-approximate\nsolution with probability at least 1 \u2212 \u03b4.\nThe number of evaluations of f is at most\n\n(cid:88)\n\nk\n\nj\u2208[B]\n\n(cid:88)\n\nn \u2212 j + 1\nB \u2212 j + 1\n\nlog\n\nB\n\u03b4\n\n= k\n\nn \u2212 B + j\n\nj\u2208[B]\n\nj\n\nlog\n\nB\n\u03b4\n\n= O\n\nkn log B log\n\nB\n\u03b4\n\n4 Maximizing k-submodular Functions with the Individual Size Constraint\n\nIn this section, we consider the problem of maximizing monotone k-submodular functions subject\nto the individual size constraint. Namely, we consider\n\nmax f (x)\n\nsubject to |suppi(x)| \u2264 Bi \u2200i \u2208 [k] and x \u2208 (k + 1)V ,\n\nwhere f : (k + 1)V \u2192 R+ is monotone k-submodular, and B1, . . . , Bk \u2208 Z+ are non-negative\nintegers.\n\n4.1 A greedy algorithm\n\nWe \ufb01rst consider a simple greedy algorithm described in Algorithm 3. We show the following:\nTheorem 4.1. Algorithm 3 outputs a 1/3-approximate solution by evaluating f at most O(knB)\ntimes.\n\nIt is clear that the number of evaluations of f is O(knB). The analysis of the approximation ratio is\ngiven in Appendix A.\n\n4.2 An almost linear-time algorithm by random sampling\n\nWe next improve the number of evaluations of f from O(knB) to O\nrithm is given in Algorithm 4. In Appendix A, we show the following.\nTheorem 4.2. Algorithm 4 outputs a 1/3-approximate solution with probability at least 1 \u2212 \u03b4 by\nevaluating f at most O\n\n. Our algo-\n\nk2n log B\n\nk2n log B\n\nk log B\n\ntimes.\n\n(cid:17)\n\n(cid:16)\n\n\u03b4\n\nk log B\n\n\u03b4\n\n(cid:16)\n\n(cid:16)\n\n(cid:17)\n\n.\n\n(cid:17)\n\n5\n\n\ffailure probability \u03b4 > 0.\n\n1: s \u2190 0 and B \u2190(cid:80)\n\ni\u2208[k] Bi.\n\nAlgorithm 4 k-Stochastic-Greedy-IS\nInput: a monotone k-submodular function f : (k + 1)V \u2192 R+, integers B1, . . . , Bk \u2208 Z+, and a\nOutput: a vector s with |suppi(s)| = Bi for each i \u2208 [k].\n2: for j = 1 to B do\n3:\n4:\n5:\n6:\n7:\n8:\n9:\n10: return s\n\nAdd a random element in V \\ (supp(s) \u222a R) to R.\n(e, i) \u2190 arg maxe\u2208R,i\u2208I \u2206e,if (s).\nif |R| \u2265 min{ n\u2212|suppi(s)|\nBi\u2212|suppi(s)| log B\n\nI \u2190 {i \u2208 [k] | suppi(s) < Bi} and R \u2190 \u2205.\nloop\n\ns(e) \u2190 i.\nbreak the loop.\n\n\u03b4 , n} then\n\n5 Experiments\n\nIn this section, we experimentally demonstrate that our algorithms outperform baseline algorithms\nand our almost linear-time algorithms signi\ufb01cantly improve ef\ufb01ciency in practice. We conducted\nexperiments on a Linux server with Intel Xeon E5-2690 (2.90 GHz) and 264GB of main memory.\nWe implemented all algorithms in C++. We measured the computational cost in terms of the number\nof function evaluations so that we can compare the ef\ufb01ciency of different methods independently\nfrom concrete implementations.\n\n5.1\n\nIn\ufb02uence maximization with k topics under the total size constraint\n\nWe \ufb01rst apply our algorithms to the problem of maximizing the spread of in\ufb02uence on several topics.\nFirst we describe our information diffusion model, called the k-topic independent cascade (k-IC)\nmodel, which generalizes the independent cascade model [6, 7]. In the k-IC model, there are k\nkinds of items, each having a different topic, and thus k kinds of rumors independently spread\nu,v for\nthrough a social network. Let G = (V, E) be a social network with an edge probability pi\neach edge (u, v) \u2208 E, representing the strength of in\ufb02uence from u to v on the i-th topic. Given\na seed s \u2208 (k + 1)V , for each i \u2208 [k], the diffusion process of the rumor about the i-th topic\nstarts by activating vertices in suppi(s), independently from other topics. Then the process unfolds\nin discrete steps according to the following randomizes rule: When a vertex u becomes active in\nthe step t for the \ufb01rst time, it is given a single chance to activate each current inactive vertex v. It\nsucceeds with probability pi\nu,v. If u succeeds, then v becomes active in the step t + 1. Whether or\nnot u succeeds, it cannot make any further attempt to activate v in subsequent steps. The process\nruns until no more activation is possible.\nThe in\ufb02uence spread \u03c3 : (k + 1)V \u2192 R+ in the k-IC model is de\ufb01ned as the expected total number\nof vertices who eventually become active in one of the k diffusion processes given a seed s, namely,\n, where Ai(suppi(s)) is a random variable representing the set of\n\u03c3(s) = E\nactivated vertices in the diffusion process of the i-th topic. Given a directed graph G = (V, E), edge\nu,v ((u, v) \u2208 E, i \u2208 [k]), and a budget B, the problem is to select a seed s \u2208 (k + 1)V\nprobabilities pi\nthat maximizes \u03c3(s) subject to |supp(s)| \u2264 B. It is easy to see that the in\ufb02uence spread function \u03c3\nis monotone k-submodular (see Appendix B for the proof).\n\ni\u2208[k] Ai(suppi(s))|(cid:105)\n\n(cid:104)|(cid:83)\n\nExperimental settings: We use a publicly available real-world dataset of a social news website\nDigg.1 This dataset consists of a directed graph where each vertex represents a user and each edge\nrepresents the friendship between a pair of users, and a log of user votes for stories. We set the\nnumber of topics k to be 10, and estimated edge probabilities on each topic from the log using the\nmethod of [1]. We set the value of B to 5, 10, . . . , 100 and compared the following algorithms:\n\n1http://www.isi.edu/\u02dclerman/downloads/digg2009.html\n\n6\n\n\fFigure 1: Comparison of in\ufb02uence spreads.\n\nFigure 2: The number of in\ufb02uence estimations.\n\n\u2022 k-Greedy-TS (Algorithm 1).\n\u2022 k-Stochastic-Greedy-TS (Algorithm 2). We chose \u03b4 = 0.1.\n\u2022 Single(i): Greedily choose B vertices only considering the i-th topic and assign them items\n\u2022 Degree: Choose B vertices in decreasing order of degrees and assign them items of ran-\n\u2022 Random: Randomly choose B vertices and assign them items of random topics.\n\nof the i-th topic.\n\ndom topics.\n\nFor the \ufb01rst three algorithms, we implemented the lazy evaluation technique [16] for ef\ufb01ciency. For\nk-Greedy-TS, we maintain an upper bound on the gain of inserting each pair (e, i) to apply the lazy\nevaluation technique directly. For k-Stochastic-Greedy-TS, we maintain an upper bound on the\ngain for each pair (e, i), and we pick up a pair in R with the largest gain for each iteration. During\nthe process of the algorithms, the in\ufb02uence spread was approximated by simulating the diffusion\nprocess 100 times. When the algorithms terminate, we simulated the diffusion process 10,000 times\nto obtain suf\ufb01ciently accurate estimates of the in\ufb02uence spread.\n\nResults: Figure 1 shows the in\ufb02uence spread achieved by each algorithm. We only show Sin-\ngle(3) among Single(i) strategies since its in\ufb02uence spread is the largest. k-Greedy-TS and k-\nStochastic-Greedy-TS clearly outperform the other methods owing to their theoretical guarantee\non the solution quality. Note that our two methods simulated the diffusion process 100 times to\nchoose a seed set, which is relatively small, because of the high computation cost. Consequently,\nthe approximate value of the in\ufb02uence spread has a relatively high variance, and this might have\ncaused the greedy method to choose seeds with small in\ufb02uence spreads. Remark that Single(3)\nworks worse than Degree for B larger than 35, which means that focusing on a single topic may\nsigni\ufb01cantly degrade the in\ufb02uence spread. Random shows a poor performance as expected.\nFigure 2 reports the number of in\ufb02uence estimations of greedy algorithms. We note that k-\nStochastic-Greedy-TS outperforms k-Greedy-TS, which implies that the random sampling tech-\nnique is effective even when combined with the lazy evaluation technique. The number of evalu-\nations of k-Greedy-TS drastically increases when B is around 40 since we run out of in\ufb02uential\nvertices and we need to reevaluate the remaining vertices. Indeed, the slope of k-Greedy-TS after\nB = 40 is almost constant in Figure 1, which indicates that the remaining vertices have a similar\nin\ufb02uence. Single(3) is faster than our algorithms since it only considers a single topic.\n\n5.2 Sensor placement with k kinds of measures under the individual size constraint\n\n\ufb01ned as H(S) = \u2212(cid:80)\n\nNext we apply our algorithms for maximizing k-submodular functions with the individual size\nconstraint to the sensor placement problem that allows multiple kinds of sensors.\nIn this prob-\nlem, we want to determine the placement of multiple kinds of sensors that most effectively re-\nduces the expected uncertainty. We need several notions to describe our model. Let \u2126 =\n{X1, X2, . . . , Xn} be a set of discrete random variables. The entropy of a subset S of \u2126 is de-\ns\u2208dom S Pr[s] log Pr[s]. The conditional entropy of \u2126 having observed S is\nH(\u2126 | S) := H(\u2126) \u2212 H(S). Hence, in order to reduce the uncertainty of \u2126, we want to \ufb01nd a set\nS of as a large entropy as possible.\nNow we formalize the sensor placement problem. There are k kinds of sensors for different mea-\nsures. Suppose that we want to allocate Bi many sensors of the i-th kind for each i \u2208 [k], and there\n\n7\n\nk-Greedy-TSk-Stochastic-Greedy-TSSingle(3)DegreeRandom050100150200250300350020406080100Influence SpreadBudget010000200003000040000500006000070000020406080100# of EvaluationsBudget\fFigure 3: Comparison of entropy.\n\nFigure 4: The number of entropy evaluations.\n\ne be the\nare set V of n locations, each of which can be instrumented with exactly one sensor. Let X i\nrandom variable representing the observation collected from a sensor of the i-th kind if it is installed\ne}i\u2208[k],e\u2208V . Then, the problem is to select x \u2208 (k + 1)V that\nat the e-th location, and let \u2126 = {X i\nsubject to |suppi(x)| \u2264 Bi for each i \u2208 [k]. It is easy\nmaximizes f (x) = H\nto see that f is monotone k-submodular (see Appendix B for details).\n\ne\u2208supp(x){X x(e)\n\n(cid:16)(cid:83)\n\n}(cid:17)\n\ne\n\nExperimental settings: We use the publicly available Intel Lab dataset.2 This dataset contains a\nlog of approximately 2.3 million readings collected from 54 sensors deployed in the Intel Berkeley\nresearch lab between February 28th and April 5th, 2004. We extracted temperature, humidity, and\nlight values from each reading and discretized these values into several bins of 2 degrees Celsius\neach, 5 points each, and 100 luxes each, respectively. Hence there are k = 3 kinds of sensors to be\nallocated to n = 54 locations. Budgets for sensors measuring temperature, humidity, and light are\ndenoted by B1, B2, and B3. We set B1 = B2 = B3 = b, where b is a parameter varying from 1 to\n18. We compare the following algorithms:\n\n\u2022 k-Greedy-IS (Algorithm 3).\n\u2022 k-Stochastic-Greedy-IS (Algorithm 4). We chose \u03b4 = 0.1.\n\n\u2022 Single(i): Allocate sensors of the i-th kind to greedily chosen(cid:80)\n\nj Bj places.\n\nWe again implemented these algorithms with the lazy evaluation technique in a similar way to the\nprevious experiment. Also note that Single(i) strategies do not satisfy the individual size constraint.\n\nResults: Figure 3 shows the entropy achieved by each algorithm. k-Greedy-IS and k-Stochastic-\nGreedy-IS clearly outperform Single(i) strategies. The maximum gap of entropies achieved by\nk-Greedy-IS and k-Stochastic-Greedy-IS is only 0.18%.\nFigure 4 shows the number of entropy evaluations of each algorithm. We observe that k-Stochastic-\nGreedy-IS outperforms k-Greedy-IS. Especially when b = 18, the number of entropy evaluations\nis reduced by 31%. Single(i) strategies are faster because they only consider sensors of a \ufb01xed kind.\n\n6 Conclusions\n\nMotivated by real-world applications, we proposed approximation algorithms for maximizing mono-\ntone k-submodular functions. Our algorithms run in almost linear time and achieve the approxima-\ntion ratio of 1/2 for the total size constraint and 1/3 for the individual size constraint. We empir-\nically demonstrated that our algorithms outperform baseline methods for maximizing submodular\nfunctions in terms of the solution quality. Improving the approximation ratio of 1/3 for the individ-\nual size constraint or showing it tight is an interesting open problem.\n\nAcknowledgments\n\nY. Y. is supported by JSPS Grant-in-Aid for Young Scientists (B) (No. 26730009), MEXT Grant-\nin-Aid for Scienti\ufb01c Research on Innovative Areas (24106003), and JST, ERATO, Kawarabayashi\nLarge Graph Project. N. O. is supported by JST, ERATO, Kawarabayashi Large Graph Project.\n\n2http://db.csail.mit.edu/labdata/labdata.html\n\n8\n\nk-Greedy-ISk-Stochastic-Greedy-ISSingle(1)Single(2)Single(3) 4 5 6 7 8 9 10 11 0 2 4 6 8 10 12 14 16 18EntropyValue of b 0 200 400 600 800 1000 1200 1400 1600 1800 0 2 4 6 8 10 12 14 16 18# of EvaluationsValue of b\fReferences\n[1] N. Barbieri, F. Bonchi, and G. Manco. Topic-aware social in\ufb02uence propagation models. In\n\nICDM, pages 81\u201390, 2012.\n\n[2] N. Buchbinder, M. Feldman, J. S. Naor, and R. Schwartz. A tight linear time (1/2)-\napproximation for unconstrained submodular maximization. In FOCS, pages 649\u2013658, 2012.\n[3] G. Calinescu, C. Chekuri, M. P\u00b4al, and J. Vondr\u00b4ak. Maximizing a monotone submodular func-\n\ntion subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740\u20131766, 2011.\n\n[4] P. Domingos and M. Richardson. Mining the network value of customers.\n\n57\u201366, 2001.\n\nIn KDD, pages\n\n[5] Y. Filmus and J. Ward. Monotone submodular maximization over a matroid via non-oblivious\n\nlocal search. SIAM Journal on Computing, 43(2):514\u2013542, 2014.\n\n[6] J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the\n\nunderlying process of word-of-mouth. Marketing Letters, 12(3):211\u2013223, 2001.\n\n[7] J. Goldenberg, B. Libai, and E. Muller. Using complex systems analysis to advance marketing\ntheory development: Modeling heterogeneity effects on new product growth through stochastic\ncellular automata. Academy of Marketing Science Review, 9(3):1\u201318, 2001.\n\n[8] I. Gridchyn and V. Kolmogorov. Potts model, parametric max\ufb02ow and k-submodular functions.\n\nIn ICCV, pages 2320\u20132327, 2013.\n\n[9] A. Huber and V. Kolmogorov. Towards minimizing k-submodular functions. In Combinatorial\n\nOptimization, pages 451\u2013462. Springer Berlin Heidelberg, 2012.\n\n[10] S. Iwata, S. Tanigawa, and Y. Yoshida. Improved approximation algorithms for k-submodular\n\nfunction maximization. In SODA, 2016. to appear.\n\n[11] D. Kempe, J. Kleinberg, and \u00b4E. Tardos. Maximizing the spread of in\ufb02uence through a social\n\nnetwork. In KDD, pages 137\u2013146, 2003.\n\n[12] C.-W. Ko, J. Lee, and M. Queyranne. An exact algorithm for maximum entropy sampling.\n\nOperations Research, 43(4):684\u2013691, 1995.\n\n[13] A. Krause, H. B. McMahon, C. Guestrin, and A. Gupta. Robust submodular observation\n\nselection. The Journal of Machine Learning Research, 9:2761\u20132801, 2008.\n\n[14] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in gaussian processes:\nTheory, ef\ufb01cient algorithms and empirical studies. The Journal of Machine Learning Research,\n9:235\u2013284, 2008.\n\n[15] H. Lin and J. Bilmes. Multi-document summarization via budgeted maximization of submod-\n\nular functions. In NAACL/HLT, pages 912\u2013920, 2010.\n\n[16] M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. Opti-\n\nmization Techniques, 7:234\u2013243, 1978.\n\n[17] B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. Vondr\u00b4ak, and A. Krause. Lazier than lazy\n\ngreedy. In AAAI, pages 1812\u20131818, 2015.\n\n[18] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maxi-\n\nmizing submodular set functions\u2014I. Mathematical Programming, 14(1):265\u2013294, 1978.\n[19] M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral marketing.\n\nIn\n\nKDD, pages 61\u201370, 2002.\n\n[20] A. P. Singh, A. Guillory, and J. A. Bilmes. On bisubmodular maximization. In AISTATS, pages\n\n1055\u20131063, 2012.\n\n[21] M. Sviridenko. A note on maximizing a submodular set function subject to a knapsack con-\n\nstraint. Operations Research Letters, 32(1):41\u201343, 2004.\n\n[22] J. Ward and S. \u02c7Zivn\u00b4y. Maximizing k-submodular functions and beyond. arXiv:1409.1399v1,\n\n2014, A preliminary version appeared in SODA, pages 1468\u20131481, 2014.\n\n9\n\n\f", "award": [], "sourceid": 482, "authors": [{"given_name": "Naoto", "family_name": "Ohsaka", "institution": "The University of Tokyo"}, {"given_name": "Yuichi", "family_name": "Yoshida", "institution": "National Institute of Informatics and Preferred Infrastructure, Inc."}]}