{"title": "A Generalization of Submodular Cover via the Diminishing Return Property on the Integer Lattice", "book": "Advances in Neural Information Processing Systems", "page_first": 847, "page_last": 855, "abstract": "We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) submodular set functions.  We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm.  Our algorithm is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly $O(n\\log (nr) \\log{r})$, where $n$ is the size of the ground set and $r$ is the maximum value of a coordinate.  The dependency on $r$ is exponentially better than the naive reduction algorithms. Several experiments on real and artificial datasets demonstrate that the solution quality of our algorithm is comparable to naive algorithms, while the running time is several orders of magnitude faster.", "full_text": "A Generalization of Submodular Cover via the\n\nDiminishing Return Property on the Integer Lattice\n\nTasuku Soma\n\nThe University of Tokyo\n\ntasuku soma@mist.i.u-tokyo.ac.jp\n\nYuichi Yoshida\n\nNational Institute of Informatics, and\n\nPreferred Infrastructure, Inc.\nyyoshida@nii.ac.jp\n\nAbstract\n\nWe consider a generalization of the submodular cover problem based on the con-\ncept of diminishing return property on the integer lattice. We are motivated by\nreal scenarios in machine learning that cannot be captured by (traditional) sub-\nmodular set functions. We show that the generalized submodular cover problem\ncan be applied to various problems and devise a bicriteria approximation algo-\nrithm. Our algorithm is guaranteed to output a log-factor approximate solution\nthat satis\ufb01es the constraints with the desired accuracy. The running time of our\nalgorithm is roughly O(n log(nr) log r), where n is the size of the ground set and\nr is the maximum value of a coordinate. The dependency on r is exponentially\nbetter than the naive reduction algorithms. Several experiments on real and arti\ufb01-\ncial datasets demonstrate that the solution quality of our algorithm is comparable\nto naive algorithms, while the running time is several orders of magnitude faster.\n\nIntroduction\n\n1\nA function f : 2S \u2192 R+ is called submodular if f (X) + f (Y ) \u2265 f (X \u222a Y ) + f (X \u2229 Y ) for\nall X, Y \u2286 S, where S is a \ufb01nite ground set. An equivalent and more intuitive de\ufb01nition is by\nthe diminishing return property: f (X \u222a {s}) \u2212 f (X) \u2265 f (Y \u222a {s}) \u2212 f (Y ) for all X \u2286 Y and\ns \u2208 S \\ Y . In the last decade, the optimization of a submodular function has attracted particular\ninterest in the machine learning community. One reason of this is that many real-world models\nnaturally admit the diminishing return property. For example, document summarization [12, 13],\nin\ufb02uence maximization in viral marketing [7], and sensor placement [10] can be described with the\nconcept of submodularity, and ef\ufb01cient algorithms have been devised by exploiting submodularity\n(for further details, refer to [8]).\nA variety of proposed models in machine learning [4, 13, 18] boil down to the submodular cover\nproblem [21]; for given monotone and nonnegative submodular functions f, c : 2S \u2192 R+, and\n\u03b1 > 0, we are to\n\nsubject to f (X) \u2265 \u03b1.\n\nminimize c(X)\n\n(1)\nIntuitively, c(X) and f (X) represent the cost and the quality of a solution, respectively. The objec-\ntive of this problem is to \ufb01nd X of minimum cost with the worst quality guarantee \u03b1. Although this\nproblem is NP-hard since it generalizes the set cover problem, a simple greedy algorithm achieves\ntight log-factor approximation and it practically performs very well.\nThe aforementioned submodular models are based on the submodularity of a set function, a function\nde\ufb01ned on 2S. However, we often encounter problems that cannot be captured by a set function. Let\nus give two examples:\n\nSensor Placement: Let us consider the following sensor placement scenario. Suppose that we\nhave several types of sensors with various energy levels. We assume a simple trade-off between\n\n1\n\n\finformation gain and cost. Sensors of a high energy level can collect a considerable amount of\ninformation, but we have to pay a high cost for placing them. Sensors of a low energy level can\nbe placed at a low cost, but they can only gather limited information. In this scenario, we want to\ndecide which type of sensor should be placed at each spot, rather than just deciding whether to place\na sensor or not. Such a scenario is beyond the existing models based on submodular set functions.\n\nOptimal Budget Allocation: A similar situation also arises in the optimal budget allocation prob-\nlem [2]. In this problem, we want to allocate budget among ad sources so that (at least) a certain\nnumber of customers is in\ufb02uenced while minimizing the total budget. Again, we have to decide\nhow much budget should be set aside for each ad source, and hence set functions cannot capture the\nproblem.\nWe note that a function f : 2S \u2192 R+ can be seen as a function de\ufb01ned on a Boolean hypercube\n{0, 1}S. Then, the above real scenarios prompt us to generalize the submodularity and the diminish-\ning return property to functions de\ufb01ned on the integer lattice ZS\n+. The most natural generalization\nof the diminishing return property to a function f : ZS\n\n+ \u2192 R+ is the following inequality:\n\n(2)\nfor x \u2264 y and s \u2208 S, where \u03c7s is the s-th unit vector. If f satis\ufb01es (2), then f also satis\ufb01es the\nfollowing lattice submodular inequality:\n\nf (x + \u03c7s) \u2212 f (x) \u2265 f (y + \u03c7s) \u2212 f (y)\n\nf (x) + f (y) \u2265 f (x \u2228 y) + f (x \u2227 y)\n\n(3)\n+, where \u2228 and \u2227 are the coordinate-wise max and min operations, respectively.\nfor all x, y \u2208 ZS\nWhile the submodularity and the diminishing return property are equivalent for set functions, this\nis not the case for functions over the integer lattice; the diminishing return property (2) is stronger\nthan the lattice submodular inequality (3). We say that f is lattice submodular if f satis\ufb01es (3),\nand if f further satis\ufb01es (2) we say that f is diminishing return submodular (DR-submodular for\nshort). One might feel that the DR-submodularity (2) is too restrictive. However, considering the\nfact that the diminishing return is more crucial in applications, we may regard the DR-submodularity\n(2) as the most natural generalization of the submodularity, at least for applications mentioned so\nfar [17, 6]. For example, under a natural condition, the objective function in the optimal budget al-\nlocation satis\ufb01es (2) [17]. The DR-submodularity was also considered in the context of submodular\nwelfare [6].\nIn this paper, we consider the following generalization of the submodular cover problem for set\n+ \u2192 R+, a subadditive function\nfunctions: Given a monotone DR-submodular function f : ZS\nc : ZS\n0 \u2264 x \u2264 r1,\n(4)\nwhere we say that c is subadditive if c(x + y) \u2264 c(x) + c(y) for all x, y \u2208 ZS\n+. We call problem (4)\nthe DR-submodular cover problem. This problem encompasses problems that boil down to the sub-\nmodular cover problem for set functions and their generalizations to the integer lattice. Furthermore,\nthe cost function c is generalized to a subadditive function. In particular, we note that two examples\ngiven above can be rephrased using this problem (see Section 4 for details).\nIf c is also monotone DR-submodular, one can reduce the problem (4) to the set version (1) (for\ntechnical details, see Section 3.1). The problem of this naive reduction is that it only yields a\npseudo-polynomial time algorithm; the running time depends on r rather than log r. Since r can be\nhuge in many practical settings (e.g., the maximum energy level of a sensor), even linear dependence\non r could make an algorithm impractical. Furthermore, for a general subadditive function c, this\nnaive reduction does not work.\n\n+ \u2192 R+, \u03b1 > 0, and r \u2208 Z+, we are to\n\nsubject to f (x) \u2265 \u03b1,\n\nminimize c(x)\n\n1.1 Our Contribution\n\nFor the problem (4), we devise a bicriteria approximation algorithm based on the decreasing thresh-\nold technique of [3]. More precisely, our algorithm takes the additional parameters 0 < \u0001, \u03b4 < 1. The\noutput x \u2208 ZS\n1 + log d\n\u03b2\ntimes the optimum and f (x) \u2265 (1 \u2212 \u03b4)\u03b1, where \u03c1 is the curvature of c (see Section 3 for the def-\ninition), d = maxs f (\u03c7s) is the maximum value of f over all standard unit vectors, and \u03b2 is the\nminimum value of the positive increments of f in the feasible region.\n\n+ of our algorithm is guaranteed to satisfy that c(x) is at most (1 + 3\u0001)\u03c1\n\n(cid:16)\n\n(cid:17)\n\n2\n\n\fRunning Time (dependency on r): An important feature of our algorithm is that the running\ntime depends on the bit length of r only polynomially whereas the naive reduction algorithms de-\npend on it exponentially as mentioned above. More precisely, the running time of our algorithm is\nlog r), which is polynomial in the input size, whereas the naive algorithm is only\nO( n\npsuedo-polynomial time algorithm. In fact, our experiments using real and synthetic datasets show\nthat our algorithm is considerably faster than naive algorithms. Furthermore, in terms of the objec-\ntive value (that is, the cost of the output), our algorithm also exhibits comparable performance.\n\n\u0001 log nrcmax\n\u03b4cmin\n\nApproximation Guarantee: Our approximation guarantee on the cost is almost tight. Note that\nthe DR submodular cover problem (4) includes the set cover problem, in which we are given a\ncollection of sets, and we want to \ufb01nd a minimum number of sets that covers all the elements. In\nour context, S corresponds to the collection of sets, the cost c is the number of chosen sets, and f\nis the number of covered elements. It is known that we cannot obtain an o(log m)-approximation\nunless P (cid:54)= NP, where m is the number of elements [16]. However, since for the set cover problem\nwe have \u03c1 = 1, d = O(m), and \u03b2 = 1, our approximation guarantee is O(log m).\n\n1.2 Related Work\n\nOur result can be compared with several results in the literature for the submodular cover problem\nfor set functions. It is shown by Wolsey [21] that if c(X) = |X|, a simple greedy algorithm yields\n\u03b2 )-approximation, which coincides with our approximation ratio except for the (1 + 3\u0001)\n(1 + log d\nfactor. Note that \u03c1 = 1 when c(X) = |X|, or more generally, when c is modular. Recently, Wan\net al. [20] discussed a slightly different setting, in which c is also submodular and both f and c\nare integer valued. They proved that the greedy algorithm achieves \u03c1H(d)-approximation, where\nH(d) = 1+1/2+\u00b7\u00b7\u00b7+1/d is the d-th harmonic number. Again, their ratio asymptotically coincides\nwith our approximation ratio (Note that \u03b2 \u2265 1 when f is integer valued).\nAnother common submodular-based model in machine learning is in the form of the submodular\nmaximization problem: Given a monotone submodular set function f : {0, 1}S \u2192 R+ and a feasible\nset P \u2286 [0, 1]S (e.g., a matroid polytope or a knapsack polytope), we want to maximize f (x) subject\nto x \u2208 P \u2229 {0, 1}S. Such models can be widely found in various tasks as already described. We\nnote that the submodular cover problem and the submodular maximization problem are somewhat\ndual to each other. Indeed, Iyer and Bilmes [5] showed that a bicriteria algorithm of one of these\nproblems yields a bicriteria algorithm for the other. Being parallel to our setting, generalizing the\nsubmodular maximization problem to the integer lattice ZS\n+ is a natural question. In this direction,\nSoma et al. [17] considered the maximization of lattice submodular functions (not necessarily being\nDR-submodular) and devised a constant-factor approximation pseudo-polynomial time algorithm.\nWe note that our result is not implied by [17] via the duality of [5]. In fact, such reduction only\nyields a pseudo-polynomial time algorithm.\n\n1.3 Organization of This Paper\n\nThe rest of this paper is organized as follows: Section 2 sets the mathematical basics of submod-\nular functions over the integer lattice. Section 3 describes our algorithm and the statement of our\nmain theorem. In Section 4, we show various experimental results using real and arti\ufb01cial datasets.\nSection 5 sketches the proof of the main theorem. Finally, we conclude the paper in Section 6.\n\n2 Preliminaries\nLet S be a \ufb01nite set. For each s \u2208 S, we denote the s-th unit vector by \u03c7s; that is, \u03c7s(t) = 1\nif t = s, otherwise \u03c7s(t) = 0. A function f : ZS \u2192 R is said to be lattice submodular if\nf (x) + f (y) \u2265 f (x \u2228 y) + f (x \u2227 y) for all x, y \u2208 ZS. A function f is monotone if f (x) \u2265 f (y)\nfor all x, y \u2208 ZS with x \u2265 y. For x, y \u2208 ZS and a function f : ZS \u2192 R, we denote f (y |\nx) := f (y + x) \u2212 f (x). A function f is diminishing return submodular (or DR-submodular) if\nf (x + \u03c7s) \u2212 f (x) \u2265 f (y + \u03c7s) \u2212 f (y) for each x \u2264 y \u2208 ZS and s \u2208 S. For a DR-submodular\nfunction f, one can immediately check that f (k\u03c7s | x) \u2265 f (k\u03c7s | y) for arbitrary x \u2264 y, s \u2208 S,\nand k \u2208 Z+. A function f is subadditive if f (x + y) \u2264 f (x) + f (y) for x, y \u2208 ZS. For each\nx \u2208 ZS\n\n+, we de\ufb01ne {x} to be the multiset in which each s \u2208 S is contained x(s) times.\n\n3\n\n\fIn [17], a lattice submodular function f : ZS \u2192 R is said to have the diminishing return property if\nf is coordinate-wise concave: f (x + 2\u03c7s) \u2212 f (x + \u03c7s) \u2264 f (x + \u03c7s) \u2212 f (x) for each x \u2208 ZS and\ns \u2208 S. We note that our de\ufb01nition is consistent with [17]. Formally, we have the following lemma,\nwhose proof can be found in Appendix.\nLemma 2.1. A function f : ZS \u2192 R is DR-submodular if and only if f is lattice submodular and\ncoordinate-wise concave.\n\nLemma 2.2. For a monotone DR-submodular function f, f (x) \u2212 f (y) \u2264(cid:80)\n\nThe following is fundamental for a monotone DR-submodular function. A proof is placed in Ap-\npendix due to the limitation of space.\ns\u2208{x} f (\u03c7s | y) for\n\narbitrary x, y \u2208 ZS.\n\n3 Algorithm for the DR-submodular Cover\n\n+ \u2192 R+ be a monotone DR-submodular\nRecall the DR-submodular cover problem (4). Let f : ZS\n+ \u2192 R+ be a subadditive cost function. The objective is to minimize c(x)\nfunction and let c : ZS\nsubject to f (x) \u2265 \u03b1 and 0 \u2264 x \u2264 r1, where \u03b1 > 0 and r \u2208 Z+ are the given constants. Without\nloss of generality, we can assume that max{f (x) : 0 \u2264 x \u2264 r1} = \u03b1 (otherwise, we can consider\n\n(cid:98)f (x) := min{f (x), \u03b1} instead of f). Furthermore, we can assume c(x) > 0 for any x \u2208 ZS\n\n+.\n\nA pseudocode description of our algorithm is presented in Algorithm 1. The algorithm can be viewed\nas a modi\ufb01ed version of the greedy algorithm and works as follows: We start with the initial solution\nx = 0 and increase each coordinate of x gradually. To determine the amount of increments, the\nalgorithm maintains a threshold \u03b8 that is initialized to be suf\ufb01ciently large enough. For each s \u2208 S,\nthe algorithm \ufb01nds the largest integer step size 0 < k \u2264 r \u2212 x(s) such that the marginal cost-gain\nratio f (k\u03c7s|x)\nis above the threshold \u03b8. If such k exists, the algorithm updates x to x + k\u03c7s. After\nrepeating this for each s \u2208 S, the algorithm decreases the threshold \u03b8 by a factor of (1 \u2212 \u0001). If x\nbecomes feasible, the algorithm returns the current x. Even if x does not become feasible, the \ufb01nal\nx satis\ufb01es f (x) \u2265 (1 \u2212 \u03b4)\u03b1 if we iterate until \u03b8 gets suf\ufb01ciently small.\n\nkc(\u03c7s)\n\n+ \u2192 R+, c : ZS\n\nc(\u03c7s), cmax \u2190 max\ns\u2208S\n\n+ \u2192 R+, r \u2208 N, \u03b1 > 0, \u0001 > 0, \u03b4 > 0.\n\nAlgorithm 1 Decreasing Threshold for the DR-Submodular Cover Problem\nInput: f : ZS\nOutput: 0 \u2264 x \u2264 r1 such that f (x) \u2265 \u03b1.\n1: x \u2190 0, d \u2190 max\nf (\u03c7s), cmin \u2190 min\ns\u2208S\ns\u2208S\nncmaxr d; \u03b8 \u2190 \u03b8(1 \u2212 \u0001)) do\n; \u03b8 \u2265 \u03b4\n2: for (\u03b8 = d\nfor all s \u2208 S do\ncmin\n3:\n4:\n5:\n6:\n7: return x\nBefore we claim the theorem, we need to de\ufb01ne several parameters on f and c. Let \u03b2 := min{f (\u03c7s |\nx) : s \u2208 S, x \u2208 ZS\n+, f (\u03c7s | x) > 0} and d := maxs f (\u03c7s). Let cmax := maxs c(\u03c7s) and\ncmin := mins c(\u03c7s). De\ufb01ne the curvature of c to be\n\nFind maximum integer 0 < k \u2264 r \u2212 x(s) such that f (k\u03c7s|x)\nIf such k exists then x \u2190 x + k\u03c7s.\nIf f (x) \u2265 \u03b1 then break the outer for loop.\n\nkc(\u03c7s) \u2265 \u03b8 with binary search.\n\nc(\u03c7s)\n\nc(x\u2217)\nDe\ufb01nition 3.1. For \u03b3 \u2265 1 and 0 < \u03b4 < 1, a vector x \u2208 ZS\nsolution if c(x) \u2264 \u03b3 \u00b7 c(x\u2217), f (x) \u2265 (1 \u2212 \u03b4)\u03b1, and 0 \u2264 x \u2264 r1.\n\nx\u2217:optimal solution\n\n\u03c1 :=\n\nmin\n\ns\u2208{x\u2217} c(\u03c7s)\n\n.\n\n(5)\n\n+ is a (\u03b3, \u03b4)-bicriteria approximate\n\nOur main theorem is described below. We sketch the proof in Section 5.\nTheorem 3.2. Algorithm 1 outputs a\n\n(1 + 3\u0001)\u03c1\n\n, \u03b4\n\n1 + log d\n\u03b2\n\n-bicriteria approximate solution\n\n(cid:16) n\n\n(cid:17)\n\nin O\n\n\u0001 log nrcmax\n\u03b4cmin\n\nlog r\n\n(cid:80)\n\n(cid:16)\n\n(cid:16)\n\n(cid:17)\n\n(cid:17)\n\ntime.\n\n4\n\n\f3.1 Discussion\nInteger-valued Case. Let us make a simple remark on the case that f is integer valued. Without\nloss of generality, we can assume \u03b1 \u2208 Z+. Then, Algorithm 1 always returns a feasible solution for\nany 0 < \u03b4 < 1/\u03b1. Therefore, our algorithm can be easily modi\ufb01ed to an approximation algorithm\nif f is integer valued.\n\nc(\u03c7s)\n\nc(\u03c7s|r1\u2212\u03c7s)\n\n+ \u2192 R+ is de\ufb01ned as \u03ba := 1 \u2212 mins\u2208S\n\nDe\ufb01nition of Curvature. Several authors [5, 19] use a different notion of curvature called the\ntotal curvature, whose natural extension for a function over the integer lattice is as follows: The\ntotal curvature \u03ba of c : ZS\n. Note that \u03ba = 0\nif c is modular, while \u03c1 = 1 if c is modular. For example, Iyer and Bilmes [5] devised a bicriteria\napproximation algorithm whose approximation guarantee is roughly O((1 \u2212 \u03ba)\u22121 log \u03b2\nd ).\nLet us investigate the relation between \u03c1 and \u03ba for DR-submodular functions. One can show that\n1 \u2212 \u03ba \u2264 \u03c1 \u2264 (1 \u2212 \u03ba)\u22121 (see Lemma E.1 in Appendix), which means that our bound in terms of \u03c1\nis tighter than one in terms of (1 \u2212 \u03ba)\u22121.\nComparison to Naive Reduction Algorithm.\nIf c is also a monotone DR-submodular function,\none can reduce (4) to the set version (1) as follows. For each s \u2208 S, create r copies of s and let\n\u02dcS be the set of these copies. For \u02dcX \u2286 \u02dcS, de\ufb01ne x \u02dcX \u2208 ZS\n+ be the integral vector such that x \u02dcX (s)\nis the number of copies of s contained in \u02dcX. Then, \u02dcf ( \u02dcX) := f (x \u02dcX ) is submodular. Similarly,\n\u02dcc( \u02dcX) := c(x \u02dcX ) is also submodular if c is a DR-submodular function. Therefore we may apply a\nstandard greedy algorithm of [20, 21] to the reduced problem and this is exactly what Greedy does\nin our experiment (see Section 4). However, this straightforward reduction only yields a pseudo-\npolynomial time algorithm since | \u02dcS| = nr; even if the original algorithm was linear, the resulting\nalgorithm would require O(nr) time. Indeed this difference is not negligible since r can be quite\nlarge in practical applications, as illustrated by our experimental evaluation.\n\nLazy Evaluation. We \ufb01nally note that we can combine the lazy evaluation technique [11, 14],\nwhich signi\ufb01cantly reduces runtime in practice, with our algorithm. Speci\ufb01cally, we \ufb01rst push all\nthe elements in S to a max-based priority queue. Here, the key of an element s \u2208 S is f (\u03c7s)\nc(\u03c7s) . Then\nthe inner loop of Algorithm 1 is modi\ufb01ed as follows: Instead of checking all the elements in S,\nwe pop elements whose keys are at least \u03b8. For each popped element s \u2208 S, we \ufb01nd k such that\n0 < k \u2264 r \u2212 x(s) with f (k\u03c7s|x)\nkc(\u03c7s) \u2265 \u03b8 with binary search. If there is such k, we update x with\nx + k\u03c7s. Finally, we push s again with the key f (\u03c7s|x)\nThe correctness of this technique is obvious because of the DR-submodularity of f. In particular,\nthe key of each element s \u2208 S in the queue is always at least f (\u03c7s|x)\nc(\u03c7s) , where x is the current vector.\nHence, we never miss s \u2208 S with f (k\u03c7s|x)\n\nif x(s) < r.\n\nc(\u03c7s)\n\nkc(\u03c7s) \u2265 \u03b8.\n\n4 Experiments\n\n4.1 Experimental Setting\n\nWe conducted experiments on a Linux server with an Intel Xeon E5-2690 (2.90 GHz) processor and\n256 GB of main memory. The experiments required, at most, 4 GB of memory. All the algorithms\nwere implemented in C++ and compiled with g++ 4.6.3.\nIn our experiments, the cost function c : ZS\n\n+ \u2192 R+ is always chosen as c(x) = (cid:107)x(cid:107)1 :=\n+ \u2192 R+ be a submodular function and \u03b1 be the worst quality guarantee.\n\ns\u2208S x(s). Let f : ZS\n\n(cid:80)\n\nWe implemented the following four methods:\n\n\u2022 Decreasing-threshold is our method with the lazy evaluation technique. We chose \u03b4 =\n\n0.01 as stated otherwise.\n\n\u2022 Greedy is a method in which, starting from x = 0, we iteratively increment x(s) for s \u2208 S\nthat maximizes f (x + \u03c7s) \u2212 f (x) until we get f (x) \u2265 \u03b1. We also implemented the lazy\nevaluation technique [11].\n\n5\n\n\f\u2022 Degree is a method in which we assign x(s) a value proportional to the marginal f (\u03c7s)\u2212\nf (0), where (cid:107)x(cid:107)1 is determined by binary search so that f (x) \u2265 \u03b1. Precisely speaking,\nx(s) is approximately proportional to the marginal since x(s) must be an integer.\n\u2022 Uniform is a method that returns k1 for minimum k \u2208 Z+ such that f (k1) \u2265 \u03b1.\n\nWe use the following real-world and synthetic datasets to con\ufb01rm the accuracy and ef\ufb01ciency of our\nmethod against other methods. We set r = 100, 000 for both problems.\n\nSensor placement. We used a dataset acquired by running simulations on a 129-vertex sensor\nnetwork used in Battle of the Water Sensor Networks (BWSN) [15]. We used the \u201cbwsn-utilities\u201d [1]\nprogram to simulate 3000 random injection events to this network for a duration of 96 hours. Let S\nand E be the set of the 129 sensors in the network and the set of the 3000 events, respectively. For\neach sensor s \u2208 S and event e \u2208 E, a value z(s, e) is provided, which denotes the time, in minutes,\nthe pollution has reached s after the injection time.1\nWe de\ufb01ne a function f : ZS\n+ be a vector, where we regard x(s) as\nthe energy level of the sensor s. Suppose that when the pollution reaches a sensor s, the probability\nthat we can detect it is 1 \u2212 (1 \u2212 p)x(s), where p = 0.0001. In other words, by spending unit energy,\nwe obtain an extra chance of detecting the pollution with probability p. For each event e \u2208 E, let se\nbe the \ufb01rst sensor where the pollution is detected in that injection event. Note that se is a random\nvariable. Let z\u221e = max\n\n+ \u2192 R+ as follows: Let x \u2208 ZS\n\ne\u2208E,s\u2208S\n\nz(s, e). Then, we de\ufb01ne f as follows:\n[z\u221e \u2212 z(se, e)],\n\nf (x) = E\ne\u2208E\n\nE\nse\n\nwhere z(se, e) is de\ufb01ned as z\u221e when there is no sensor that managed to detect the pollution. In-\n[z\u221e \u2212 z(se, e)] expresses how much time we managed to save in the event e\ntuitively speaking, E\nse\non average. Then, we take the average over all the events. A similar function was also used in [11]\nto measure the performance of a sensor allocation although they only considered the case p = 1.\nThis corresponds to the case that by spending unit energy at a sensor s, we can always detect the\npollution that has reached s. We note that f (x) is DR-submodular (see Lemma F.1 for the proof).\n\nBudget allocation problem.\nIn order to observe the behavior of our algorithm for large-scale\ninstances, we created a synthetic instance of the budget allocation problem [2, 17] as follows: The\ninstance can be represented as a bipartite graph (S, T ; E), where S is a set of 5,000 vertices and T\nis a set of 50,000 vertices. We regard a vertex in S as an ad source, and a vertex in T as a person.\nThen, we \ufb01x the degrees of vertices in S so that their distribution obeys the power law of \u03b3 := 2.5;\nthat is, the fraction of ad sources with out-degree d is proportional to d\u2212\u03b3. For a vertex s \u2208 S of\nthe supposed degree d, we choose d vertices in T uniformly at random and connect them to s with\nedges. We de\ufb01ne a function f : ZS\n\n(cid:16)\n(cid:88)\n1 \u2212 (cid:89)\n+ \u2192 R+ as\n\nf (x) =\n\n(1 \u2212 p)x(s)(cid:17)\n\nt\u2208T\n\ns\u2208\u0393(t)\n\n,\n\n(6)\n\nwhere \u0393(t) is the set of vertices connected to t and p = 0.0001. Here, we suppose that, by investing\na unit cost to an ad source s \u2208 S, we have an extra chance of in\ufb02uencing a person t \u2208 T with\ns \u2208 \u0393(t) with probability p. Then, f (x) can be seen as the expected number of people in\ufb02uenced\nby ad sources. We note that f is known to be a monotone DR-submodular function [17].\n\n4.2 Experimental Results\nFigure 1 illustrates the obtained objective value (cid:107)x(cid:107)1 for various choices of the worst quality guar-\nantee \u03b1 on each dataset. We chose \u0001 = 0.01 in Decreasing threshold. We can observe that De-\ncreasing threshold attains almost the same objective value as Greedy, and it outperforms Degree\nand Uniform.\nFigure 2 illustrates the runtime for various choices of the worst quality guarantee \u03b1 on each dataset.\nWe chose \u0001 = 0.01 in Decreasing threshold. We can observe that the runtime growth of Decreas-\ning threshold is signi\ufb01cantly slower than that of Greedy.\n\n1Although three other values are provided, they showed similar empirical results and we omit them.\n\n6\n\n\f(a) Sensor placement (BWSN)\n\n(a) Sensor placement (BWSN)\n\n(a) Relative cost increase\n\n(b) Budget allocation (synthetic)\n\n(b) Budget allocation (synthetic)\n\n(b) Runtime\n\nFigure 1: Objective values\n\nFigure 2: Runtime\n\nFigure 3: Effect of \u0001\n\nFigures 3(a) and 3(b) show the relative increase of the objective value and the runtime, respectively,\nof our method against Greedy on the BWSN dataset. We can observe that the relative increase of the\nobjective value gets smaller as \u03b1 increases. This phenomenon can be well explained by considering\nthe extreme case that \u03b1 = max f (r1). In this case, we need to choose x = r1 anyway in order to\nachieve the worst quality guarantee, and the order of increasing coordinates of x does not matter.\nAlso, we can see that the empirical runtime grows as a function of 1\n\u0001 , which matches our theoretical\nbound.\n\n5 Proof of Theorem 3.2\n\nIn this section, we outline the proof of the main theorem. Proofs of some minor claims can be found\nin Appendix.\nFirst, we introduce a notation. Let us assume that x is updated L times in the algorithm. Let xi be\nthe variable x after the i-th update (i = 0, . . . , L). Note that x0 = 0 and xL is the \ufb01nal output of\nthe algorithm. Let si \u2208 S and ki \u2208 Z+ be the pair used in the i-th update for i = 1, . . . , L; that is,\nxi = xi\u22121 + ki\u03c7si for i = 1, . . . , L. Let \u00b50 := 0 and \u00b5i := kic(\u03c7si )\nf (ki\u03c7si|xi\u22121) for i = 1, . . . , L. Let\n\u02c6\u00b50 := 0 and \u02c6\u00b5i := \u03b8\u22121\nfor i = 1, . . . , L, where \u03b8i is the threshold value on the i-th update. Note that\n\n\u02c6\u00b5i\u22121 \u2264 \u02c6\u00b5i for i = 1, . . . , L. Let x\u2217 be an optimal solution such that \u03c1 \u00b7 c(x\u2217) =(cid:80)\n\ns\u2208{x\u2217} c(\u03c7s).\nWe regard that in the i-th update, the elements of {x\u2217} are charged by the value of \u00b5i(f (\u03c7s |\nxi\u22121) \u2212 f (\u03c7s | xi)). Then, the total charge on {x\u2217} is de\ufb01ned as\n\ni\n\nT (x, f ) :=\n\n\u00b5i(f (\u03c7s | xi\u22121) \u2212 f (\u03c7s | xi)).\n\n(cid:88)\n\nL(cid:88)\n\ns\u2208{x\u2217}\n\ni=1\n\nClaim 5.1. Let us \ufb01x 1 \u2264 i \u2264 L arbitrary and let \u03b8 be the threshold value on the i-th update. Then,\n\nf (ki\u03c7si | xi\u22121)\n\nkic(\u03c7si)\n\n\u2265 \u03b8\n\nand\n\nf (\u03c7s | xi\u22121)\n\nc(\u03c7s)\n\n\u2264 \u03b8\n1 \u2212 \u0001\n\n(s \u2208 S).\n\nEliminating \u03b8 from the inequalities in Claim 5.1, we obtain\n\nkic(\u03c7si)\n\nf (ki\u03c7si | xi\u22121)\n\n\u2264 1\n1 \u2212 \u0001\n\nc(\u03c7s)\n\nf (\u03c7s | xi\u22121)\n\n(i = 1, . . . , L,\n\ns \u2208 S)\n\n(7)\n\n7\n\n050010001500200025003000\u00ae050001000015000200002500030000Objective valueUniformDecreasing thresholdDegreeGreedy05000100001500020000\u00ae0.00.51.01.52.02.5Objective value1e8GreedyDecreasing thresholdDegreeUniform050010001500200025003000\u00ae10-210-1100101102103104time (s)UniformDecreasing thresholdDegreeGreedy05000100001500020000\u00ae10-210-1100101102103104time (s)GreedyDecreasing thresholdDegreeUniform050010001500200025003000\u00ae010-310-210-1100101102103Relative increase of the objective value1.00.10.010.0010.0001050010001500200025003000\u00ae10-1100101102103104time (s)1.00.10.010.0010.0001Greedy\f1\u2212\u0001 \u00b5i for i = 1, . . . , L.\n\nFurthermore, we have \u00b5i \u2264 \u02c6\u00b5i \u2264 1\nClaim 5.2. c(x) \u2264 1\nClaim 5.3. For each s \u2208 {x\u2217}, the total charge on s is at most\nProof. Let us \ufb01x s \u2208 {x\u2217} and let l be the minimum i such that f (\u03c7s | xi) = 0. By (7), we have\n\n1\u2212\u0001 (1 + log(d/\u03b2))c(\u03c7s).\n\n1\u2212\u0001 T (x, f ).\n\n1\n\n\u00b5i =\n\nkic(\u03c7si )\n\nf (ki\u03c7si | xi\u22121)\n\n\u2264 1\n1 \u2212 \u0001\n\n\u00b7\n\nc(\u03c7s)\n\nf (\u03c7s | xi\u22121)\n\n.\n\n(i = 1, . . . , l)\n\n\u00b5i(f (\u03c7s | xi\u22121) \u2212 f (\u03c7s | xi)) =\n\n\u00b5i(f (\u03c7s | xi\u22121) \u2212 f (\u03c7s | xi)) + \u00b5lf (\u03c7s | xl\u22121)\n\nl\u22121(cid:88)\n\ni=1\n\n(cid:17)\n\nf (\u03c7s | xl\u22121)\nf (\u03c7s | xl\u22121)\n\n+\n\nThen, we have\n\nL(cid:88)\n\nc(\u03c7s)\n\nc(\u03c7s)\n\ni=1\n\n(cid:16) l\u22121(cid:88)\n(cid:16)\n(cid:16)\n(cid:16)\n\n1 +\n\nc(\u03c7s)\n\n1 +\n\nc(\u03c7s)\n\n1 + log\n\n(f (\u03c7s | xi\u22121) \u2212 f (\u03c7s | xi))\n\n(cid:16)\nl\u22121(cid:88)\nl\u22121(cid:88)\n\ni=1\n\ni=1\n\n(cid:17)(cid:17)\n\nf (\u03c7s | xi\u22121)\n1 \u2212 f (\u03c7s | xi)\nf (\u03c7s | xi\u22121)\n(cid:17)\nf (\u03c7s | xi\u22121)\n(cid:17) \u2264 1\nf (\u03c7s | xi)\n\nlog\nf (\u03c7s | x0)\nf (\u03c7s | xl\u22121)\n\n1 \u2212 \u0001\n\ni=1\n\n\u2264 1\n1 \u2212 \u0001\n\n\u2264 1\n1 \u2212 \u0001\n\n\u2264 1\n1 \u2212 \u0001\n1\n1 \u2212 \u0001\n\n=\n\nProof of Theorem 3.2. Combining these claims, we have\nc(x) \u2264 1\n1 \u2212 \u0001\n\n(1 \u2212 \u0001)2 \u00b7\n\n\u00b7 T (x, f ) \u2264\n\n1 + log\n\nd\n\u03b2\n\n1\n\nc(\u03c7s) \u2264 (1 + 3\u0001) \u00b7\n\n(cid:18)\n\n1 + log\n\n(cid:16)\n(cid:19)\n\u00b7 (cid:88)\n\ns\u2208{x\u2217}\n\n(since 1 \u2212 1/x \u2264 log x for x \u2265 1)\n\nc(\u03c7s)\n\n(cid:17)\n\nd\n\u03b2\n\n(cid:18)\n\n1 + log\n\n(cid:19)\n\nd\n\u03b2\n\n\u00b7 \u03c1c(x\u2217).\n\nThus, x is an approximate solution with the desired ratio.\nLet us see that x approximately satis\ufb01es the constraint; that is, f (x) \u2265 (1 \u2212 \u03b4)\u03b1. We will now\nconsider a slightly modi\ufb01ed version of the algorithm; in the modi\ufb01ed algorithm, the threshold is\nupdated until f (x) = \u03b1. Let x(cid:48) be the output of the modi\ufb01ed algorithm. Then, we have\n\nf (x(cid:48)) \u2212 f (x) \u2264 (cid:88)\n\nf (\u03c7s | x) \u2264 (cid:88)\n\ns\u2208{x(cid:48)}\n\ns\u2208{x(cid:48)}\n\n\u03b4c(\u03c7s)\ncmaxnr\n\nd \u2264 \u03b4d \u2264 \u03b4\u03b1\n\nThe third inequality holds since c(\u03c7s) \u2264 cmax and |{x(cid:48)}| \u2264 nr. Thus f (x) \u2265 (1 \u2212 \u03b4)\u03b1.\n\n6 Conclusions\n\nIn this paper, motivated by real scenarios in machine learning, we generalized the submodular cover\nproblem via the diminishing return property over the integer lattice. We proposed a bicriteria ap-\nproximation algorithm with the following properties: (i) The approximation ratio to the cost almost\nmatches the one guaranteed by the greedy algorithm [21] and is almost tight in general. (ii) We can\nsatisfy the worst solution quality with the desired accuracy. (iii) The running time of our algorithm\nis roughly O(n log n log r). The dependency on r is exponentially better than that of the greedy al-\ngorithm. We con\ufb01rmed by experiment that compared with the greedy algorithm, the solution quality\nof our algorithm is almost the same and the runtime is several orders of magnitude faster.\n\nAcknowledgments\n\nThe \ufb01rst author is supported by JSPS Grant-in-Aid for JSPS Fellows. The second author is supported\nby JSPS Grant-in-Aid for Young Scientists (B) (No. 26730009), MEXT Grant-in-Aid for Scienti\ufb01c\nResearch on Innovative Areas (24106003), and JST, ERATO, Kawarabayashi Large Graph Project.\nThe authors thank Satoru Iwata and Yuji Nakatsukasa for reading a draft of this paper.\n\n8\n\n\fReferences\n[1] http://www.water-simulation.com/wsp/about/bwsn/.\n[2] N. Alon, I. Gamzu, and M. Tennenholtz. Optimizing budget allocation among channels and\n\nin\ufb02uencers. In Proc. of WWW, pages 381\u2013388, 2012.\n\n[3] A. Badanidiyuru and J. Vondr\u00b4ak. Fast algorithms for maximizing submodular functions. In\n\nProc. of SODA, pages 1497\u20131514, 2014.\n\n[4] Y. Chen, H. Shioi, C. A. F. Montesinos, L. P. Koh, S. Wich, and A. Krause. Active detection\n\nvia adaptive submodularity. In Proc. of ICML, pages 55\u201363, 2014.\n\n[5] R. Iyer and J. Bilmes. Submodular optimization with submodular cover and submodular knap-\n\nsack constraints. In Proc. of NIPS, pages 2436\u20132444, 2013.\n\n[6] M. Kapralov, I. Post, and J. Vondrak. Online submodular welfare maximization: Greedy is\n\noptimal. In Proc. of SODA, pages 1216\u20131225, 2012.\n\n[7] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of in\ufb02uence through a social\n\nnetwork. In Proc. of KDD, pages 137\u2013146, 2003.\n\n[8] A. Krause and D. Golovin. Submodular function maximization.\n\nIn Tractability: Practical\n\nApproaches to Hard Problems, pages 71\u2013104. Cambridge University Press, 2014.\n\n[9] A. Krause and J. Leskovec. Ef\ufb01cient sensor placement optimization for securing large water\ndistribution networks. Journal of Water Resources Planning and Management, 134(6):516\u2013\n526, 2008.\n\n[10] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in gaussian processes:\nTheory, ef\ufb01cient algorithms and empirical studies. The Journal of Machine Learning Research,\n9:235\u2013284, 2008.\n\n[11] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective\n\noutbreak detection in networks. In Proc. of KDD, pages 420\u2013429, 2007.\n\n[12] H. Lin and J. Bilmes. Multi-document summarization via budgeted maximization of submod-\nular functions. In Proceedings of the Annual Conference of the North American Chapter of the\nAssociation for Computational Linguistics, pages 912\u2013920, 2010.\n\n[13] H. Lin and J. Bilmes. A class of submodular functions for document summarization. In Proc.\n\nof NAACL, pages 510\u2013520, 2011.\n\n[14] M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. Opti-\n\nmization Techniques, Lecture Notes in Control and Information Sciences, 7:234\u2013243, 1978.\n\n[15] A. Ostfeld, J. G. Uber, E. Salomons, J. W. Berry, W. E. Hart, C. A. Phillips, J.-P. Watson,\nG. Dorini, P. Jonkergouw, Z. Kapelan, F. di Pierro, S.-T. Khu, D. Savic, D. Eliades, M. Polycar-\npou, S. R. Ghimire, B. D. Barkdoll, R. Gueli, J. J. Huang, E. A. McBean, W. James, A. Krause,\nJ. Leskovec, S. Isovitsch, J. Xu, C. Guestrin, J. VanBriesen, M. Small, P. Fischbeck, A. Preis,\nM. Propato, O. Piller, G. B. Trachtman, Z. Y. Wu, and T. Walski. The battle of the water\nsensor networks (BWSN): A design challenge for engineers and algorithms. Journal of Water\nResources Planning and Management, 134(6):556\u2013568, 2008.\n\n[16] R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-constant\n\nerror-probability PCP characterization of NP. In Proc. of STOC, pages 475\u2013484, 1997.\n\n[17] T. Soma, N. Kakimura, K. Inaba, and K. Kawarabayashi. Optimal budget allocation: Theoret-\n\nical guarantee and ef\ufb01cient algorithm. In Proc. of ICML, 2014.\n\n[18] H. O. Song, R. Girshick, S. Jegelka, J. Mairal, Z. Harchaoui, and T. Darrell. On learning to\n\nlocalize objects with minimal supervision. In Proc. of ICML, 2014.\n\n[19] M. Sviridenko, J. Vondr\u00b4ak, and J. Ward. Optimal approximation for submodular and super-\n\nmodular optimization with bounded curvature. In Proc. of SODA, pages 1134\u20131148, 2015.\n\n[20] P.-J. Wan, D.-Z. Du, P. Pardalos, and W. Wu. Greedy approximations for minimum submodular\ncover with submodular cost. Computational Optimization and Applications, 45(2):463\u2013474,\n2009.\n\n[21] L. A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem.\n\nCombinatorica, 2(4):385\u2013393, 1982.\n\n9\n\n\f", "award": [], "sourceid": 538, "authors": [{"given_name": "Tasuku", "family_name": "Soma", "institution": "University of Tokyo"}, {"given_name": "Yuichi", "family_name": "Yoshida", "institution": "National Institute of Informatics and Preferred Infrastructure, Inc."}]}