{"title": "Learning Erdos-Renyi Random Graphs via Edge Detecting Queries", "book": "Advances in Neural Information Processing Systems", "page_first": 404, "page_last": 414, "abstract": "In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $\\Omega( \\min\\{ k^2 \\log n, n^2\\})$ tests (even when a small probability of error is allowed), we show that learning an Erd\\H{o}s-R\\'enyi random graph with an average of $\\kbar$ edges is much easier; namely, one can attain asymptotically vanishing error probability with only $O(\\kbar \\log n)$ tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of $O(\\kbar \\log^2 \\kbar + \\kbar \\log n)$.", "full_text": "Learning Erd\u02ddos-R\u00e9nyi Random Graphs\n\nvia Edge Detecting Queries\n\nZihan Li\n\nNational University of Singapore\n\nlizihan@u.nus.edu\n\nMatthias Fresacher\nUniversity of Adelaide\n\nmatthias.fresacher@adelaide.edu.au\n\nJonathan Scarlett\n\nNational University of Singapore\nscarlett@comp.nus.edu.sg\n\nAbstract\n\nIn this paper, we consider the problem of learning an unknown graph via queries\non groups of nodes, with the result indicating whether or not at least one edge is\npresent among those nodes. While learning arbitrary graphs with n nodes and k\nedges is known to be hard in the sense of requiring \u2126(min{k2 log n, n2}) tests\n(even when a small probability of error is allowed), we show that learning an\nErd\u02ddos-R\u00e9nyi random graph with an average of k edges is much easier; namely, one\ncan attain asymptotically vanishing error probability with only O(k log n) tests.\nWe establish such bounds for a variety of algorithms inspired by the group testing\nproblem, with explicit constant factors indicating a near-optimal number of tests,\nand in some cases asymptotic optimality including constant factors. In addition,\nwe present an alternative design that permits a near-optimal sublinear decoding\ntime of O(k log2 k + k log n).\n\nIntroduction\n\n1\nGraphs are a ubiquitous tool in modern statistics and machine learning for depicting interactions,\nrelations, and physical connections in networks, such as social networks, biological networks, sensor\nnetworks, and so on. Often, the graph is not known a priori, and must be learned via queries to the\nnetwork. In this paper, we consider the problem of graph learning via edge detecting queries, where\neach query contains a subset of the nodes, and the binary outcome indicates whether or not there is at\nleast one edge among these nodes. See Section 1.1 for previous work on this problem.\nAn application of this problem highlighted in previous works such as [16] is that of learning which\nchemicals react with each other, using tests that are able to detect whether any reaction occurs.\nAnother potential application is learning connectivity in large wireless networks: Each node has a\nunique identi\ufb01er, and in response to a query, a node sends feedback to a central unit if the query\nincludes both itself and at least one of its neighbors. Then, to attain the query outcome, the central\nunit only has to detect whether any feedback signal was received.\nWe consider the fundamental question of how many queries are needed to learn the graph. Under\nadaptive testing (i.e., tests can be designed based on previous outcomes), this question is well-\nunderstood [30], as outlined below. However, an impossibility result of [1] indicates that considerably\nmore non-adaptive tests are needed in the worst-case sense for the class of graphs with a bounded\nnumber of edges. We show that this picture is much more positive in the average-case sense by\nstudying the average performance with respect to Erd\u02ddos-R\u00e9nyi graphs [14]. In addition, to demonstrate\nthat these \ufb01ndings are not overly reliant on the speci\ufb01c random graph model, we also present similar\n\ufb01ndings assuming only bounds on the number of edges and the maximum degree (see Appendix H).\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f1.1 Related Work\nThe problem considered in this paper can be viewed as a constrained group testing problem [8,\nSec. 5.8]. We highlight the most relevant group testing works throughout the paper, and here simply\nrefer the reader to [23] for a survey of the zero-error setting, and to [8] for a survey of the small-error\nsetting (i.e., the algorithm is allowed a small probability of failure). These settings are fundamentally\ndifferent, since the number of tests in the small error setting is O(K log N ) (for K defectives among\nN items), while the zero-error criterion requires \u2126(min{N, K 2}) tests.\nEarly works on graph learning via edge detecting queries considered identifying a single edge [3, 4]\nand then several edges [30] in a slightly more general scenario where the \u201cdefective graph\u201d G is\nknown to be a sub-graph of a larger graph H. Several works considered speci\ufb01c graph classes such\nas matchings, stars, and cliques [9, 10, 27]. We particularly highlight the work of Johann [30], who\ngave an adaptive procedure requiring k log2\nlarger graph H; this bound is optimal up to the O(k) remainder term. More recently, extensions to\nhypergraphs have also been considered [2, 11, 12, 24].\nWhile the adaptive setting is well-understood, the non-adaptive setting [1, 32] and adaptive settings\nwith limited stages [1, 17, 28] are more challenging. We refer the reader to [1] for a recent survey of\nwhat is known, with a notable distinction between Monte Carlo and Las Vegas style algorithms. We\nhighlight that in stark contrast with the standard group testing problem, the number of non-adaptive\ntests required to identify arbitrary graphs with k edges and n nodes is at least \u2126(min{k2 log n, n2}),\neven under the small-error criterion.1\n\nk + O(k) tests, where (cid:101)E is the set of edges in the\n|(cid:101)E|\n\n1.2 Contributions\nThe \u2126(min{k2 log n, n2}) hardness result given in [1] holds with respect to worst-case graphs\ncontaining k edges, which raises the question of whether some notion of average-case or further\nrestricted graph classes can overcome this inherent dif\ufb01culty. We focus primarily on the average\ncase with respect to the ubiquitous Erd\u02ddos-R\u00e9nyi random graph model2 and the small-error criterion,\nshowing that indeed the number of tests required reduces to O(k log n) for graphs with an average of\n\u221a\nk edges, and providing fairly tight explicit constant factors. In Appendix H, we describe how to attain\nsimilar results for general graphs with at most k edges and maximum degree d = o(\nk), albeit with\nslightly worse constant factors.\nIn more detail, we show the following for Erd\u02ddos-Renyi random graphs:\n\ngraphs within a high-probability set;\n\n\u2022 We provide a simple algorithm-independent lower bound based on counting the number of\n\u2022 We extend the COMP, DD, and SSS decoding algorithms [6, 19] from standard group testing\nto the graph learning problem, and provide upper and lower bounds on their asymptotic\nperformance under a natural random test design.\n\u2022 We propose a sublinear-time decoding algorithm (and its associated test design) based on the\nGROTESQUE algorithm [18], and show that it succeeds with high probability with O(k log2 k +\nk log n) decoding time, thus nearly matching an \u2126(k log n2\nk\n\n) lower bound.\n\nBrie\ufb02y, the above-mentioned decoding algorithms are described as follows: COMP (cf., Section 4.1)\nassumes all pairs are edges unless their nodes are both in some negative test, DD (cf., Section 4.2)\nuses the COMP solution to identify \u201cpossible edges\u201d and then declares a pair to be an edge only if it\nis the unique possible edge among the nodes in some test, and SSS (cf., Section 5) solves an integer\nprogram to \ufb01nd the sparsest graph consistent with the test outcomes.\nWhile the group testing algorithms themselves extend easily to our setting, their theoretical analyses\nrequire signi\ufb01cant additional effort (see Appendix I for further discussion). For instance, for group\ntesting, the analysis is symmetric with respect to any defective set of size k, whereas for graph learning,\ndifferent graphs with a \ufb01xed number of edges can behave very differently, and even seemingly simple\ntasks (e.g., determining the probability of a positive test) become challenging.\n\n1Note that the number of items N in the standard group testing corresponds to(cid:0)n\n\n(cid:1) = \u0398(n2) in the graph\n\nlearning problem with n nodes, since pairs of nodes (i.e., potential edges) play the role of items. See Appendix I\nfor a brief description of the group testing problem.\n\n2\n\n2More precisely, we consider the variant introduced by Gilbert [26].\n\n2\n\n\fWith the exception of the sublinear-\ntime decoding part, our results are\nsummarized in Figure 1, where we\nplot the asymptotic ratio between the\ninformation-theoretic bound and the\nnumber of tests for each algorithm, for\nsparsity levels \u03b8 \u2208 (0, 1) such that k =\n\u0398(n2\u03b8). We observe the following:\n\n\u2022 For \u03b8 > 1\n\n2, the DD upper bound\nand SSS lower bound match un-\nder i.i.d. random testing. As we\nexplain in Section 5, SSS is the op-\ntimal algorithm, so if it fails then\nso does any algorithm. Hence, DD\nis asymptotically optimal (includ-\ning constants) under i.i.d. random\ntesting for \u03b8 > 1\n2.\n\n\u2022 For \u03b8 \u2264 1\n\n2, DD succeeds with\nfewer than twice as many tests as\nthe optimal information-theoretic\nthreshold; the latter is a converse\nbound applying to any test design\n(not only i.i.d. random testing).\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\ns\nt\ns\ne\nT\n#\ne\ns\nr\ne\nv\nn\nI\n\ne\nv\ni\nt\na\nl\ne\nR\n\n0\n\n0\n\nConverse\n\nSSS\n\nDD\n\nCOMP\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\nSparsity Parameter \u03b8\n\n#Tests\n\nFigure 1: Asymptotic values of k log2(1/q)\nfor recover-\ning Erd\u02ddos-R\u00e9nyi random graphs with edge probability\n\nq = \u0398(n2(\u03b8\u22121)), and average number of edges k = q(cid:0)n\n(cid:1).\n\nThe \u201cCOMP\u201d and \u201cDD\u201d curves are achievability bounds,\nwhereas the \u201cConverse\u201d and \u201cSSS\u201d curves are converse\nbounds for arbitrary test designs and i.i.d. random test\ndesigns, respectively).\n\n2\n\nWhile analogous results have been established for standard group testing [6], we again highlight that\nthe analysis comes with several non-trivial challenges, particularly when it comes to DD and SSS.\nSee Appendix I for an outline of the main differences.\n\n2 Setup\n\nV = {1, . . . , n}, and the edge set E contains up to(cid:0)n\n\n(cid:1) pairs of nodes. We adopt a random graph\n\nWe seek to learn an unknown undirected graph G = (V, E) with n nodes, i.e., the vertex set is\n\nmodel in which each edge appears in the graph independently with probability q (i.e., the Erd\u02ddos-R\u00e9nyi\ngraph ER(n, q)). After the graph G is randomly drawn, it is \ufb01xed throughout the entire testing\nprocess (described below).\nWe test the nodes in groups; the output of each test takes the form\n\n2\n\n(cid:95)\n\n(cid:8)Xi \u2229 Xj\n\n(cid:9),\n\nY =\n\n(i,j)\u2208E\n\nwhere the binary-valued test vector X = (X1, . . . , Xn) indicates which nodes are included in the test.\nThat is, the resulting output Y = 1 if and only if at least one edge exists in the sub-graph of G induced\nby the nodes included in the test; we henceforth use the terminology that such an edge is covered.\nWe refer to tests with Y = 1 as positive, and tests with Y = 0 as negative. A total of t tests are\nperformed according to the test vectors X (1), . . . , X (t) to produce the outcomes Y (1), . . . , Y (t). We\nfocus on non-adaptive tests, where X (1), . . . , X (t) must be selected prior to observing any outcomes.\n\nGiven the tests and their outcomes, a decoder forms an estimate (cid:98)G of the graph G, or equivalently, an\nestimate (cid:98)E of the edge set E. One wishes to design a sequence of tests X (1), . . . , X (t), with t ideally\n\nas small as possible, such that the decoder recovers G with probability arbitrarily close to one. The\nerror probability is given by\n\nPe := P[(cid:98)G (cid:54)= G],\n\nand is taken over the randomness of the graph G, as well as the tests X (1), . . . , X (t) (if randomized).\nWe only consider deterministic decoding algorithms (without loss of optimality), and all of our results\nare asymptotic in the limit as n \u2192 \u221e (with q varying as a function of n).\n\n3\n\n(1)\n\n(2)\n\n\f2.1 Sparsity Level\nWe focus our attention on sparse graphs, i.e., q = o(1) as n \u2192 \u221e.3 More speci\ufb01cally, we consider\nthe sublinear scaling regime q = \u0398(n\u22122(1\u2212\u03b8)) for some \u03b8 \u2208 (0, 1), meaning that the average number\n\n(cid:1)q behaves as \u0398(n2\u03b8). By the assumption \u03b8 \u2208 (0, 1), we also have\n\nof edges k =(cid:0)n\n\n2\n\nn\u2212(2\u2212\u03b7) (cid:28) q (cid:28) n\u2212\u03b7, n\u03b7 (cid:28) k (cid:28) n2\u2212\u03b7\n\n(3)\n\nfor suf\ufb01ciently small (but constant) \u03b7 > 0 and suf\ufb01ciently large n. Here and subsequently, we write\nf (n) (cid:28) g(n) as a shorthand for f (n) = o(g(n)).\n2.2 Bernoulli Random Testing\nFor the most part, we will focus on the case that the tests are designed randomly: Each node is\nindependently placed in each test with a given probability p. We refer to this as i.i.d. Bernoulli\ntesting, or simply Bernoulli testing for short. Analogous designs are known to lead to most of the\nbest-known performance bounds in the group testing literature [6, 37], with the exception of some\nslight improvements shown recently via more structured random designs [20, 31].\n\nWe parametrize p as p =\nqn2 for some constant \u03bd > 0, as this scaling regime turns out to be\noptimal in all cases (with the choice \u03bd = 1 further being optimal for the algorithms we consider).\nNote that this choice of p gives p2 = \u03bd\nk\nWhen studying probabilities associated with a single random Bernoulli test, we will denote the test\noutcome by Y , and the (random) indices of nodes included in the test by L \u2286 {1, . . . , n}. In addition,\nPG[\u00b7] denotes probability (with respect to the random testing alone) when the underlying graph is G.\n2.3 Typical Graphs\nThroughout the paper, we frequently make use of the following typical set of graphs:\n\n(1 + o(1)), since k = 1\n\n2 qn2(1 + o(1)).\n\n(cid:113) 2\u03bd\n\n(cid:110)\n\nT (\u0001n) =\n\nG : (1 \u2212 \u0001n)k \u2264 k \u2264 (1 + \u0001n)k, d \u2264 dmax,\n\n(1 \u2212 \u0001n)(1 \u2212 e\u2212\u03bd) \u2264 PG[Y = 1] \u2264 (1 + \u0001n)(1 \u2212 e\u2212\u03bd)\n\nwhere k = |E| is the number of edges, d is the maximum degree of G, and\n\n(cid:26)2nq\n\ndmax =\n\n\u03b8 > 1\nlog n \u03b8 \u2264 1\n2\n2 .\n\n(cid:111)\n\n,\n\n(4)\n\n(5)\n\nThe following lemma justi\ufb01es the terminology typical set by showing that the random graph lies in\nthis set with probability approaching one.\nLemma 1. Fix \u03b8 \u2208 (0, 1), and let G \u223c ER(n, q) for some q = \u0398(n\u22122(1\u2212\u03b8)). In addition, suppose\nthat PG[Y = 1] in (4) is de\ufb01ned with respect to Bernoulli(p) testing with p =\nqn2 for \ufb01xed \u03bd > 0.\nThen, there exists a sequence \u0001n \u2192 0 such that P[G \u2208 T (\u0001n)] \u2192 1 as n \u2192 \u221e.\nThe condition on k in the typical set simply states that the number of edges is close to its mean, which\nfollows by standard concentration bounds. The bound on the maximum degree is similarly standard\nand straightforward to establish. By far the most challenging part is bounding PG[Y = 1] with high\nprobability; this is done using the inclusion-exclusion principle (i.e., Bonferroni\u2019s inequalities) and\ncarefully bounding the probability of a random test containing one edge, two edges, three edges, and\nso on. The details are given in Appendix A.\nUsing the bounds on k and d in (4), along with the fact that q satis\ufb01es (3), we readily observe that\n\n(cid:113) 2\u03bd\n\nd2 (cid:28) k,\n\ndp (cid:28) 1\n\n(6)\n\nin both cases of (5). Note that the second of these statements follows immediately from the \ufb01rst since\n\nwe focus on the regime p = \u0398(cid:0) 1\u221a\n\n(cid:1).\n\nk\n\n3In fact, the arguments given in [5] can be applied to the present setting to show that one essentially cannot\n\nimprove on individual testing of edges when q = \u0398(1).\n\n4\n\n\fAlgorithm 1 Combinatorial Orthogonal Matching Pursuit (COMP)\nInput: Test designs {L(i)}t\n\ni=1, outcomes Y = (Y (1), . . . , Y (t))\n\n1: Initialize (cid:98)E to contain all(cid:0)n\n4: return (cid:98)G = (V,(cid:98)E)\n\n2\n\nRemove all edges from (cid:98)E whose nodes are both in L(i)\n\n2: for each i such that Y (i) = 0 do\n3:\n\n(cid:1) edges\n\n3 Algorithm-Independent Converse Bound\nTo provide a benchmark for our upper bounds, we \ufb01rst provide a simple algorithm-independent lower\nbound on the number of tests for attaining asymptotically vanishing error probability, which is based\non fairly standard counting arguments and Fano\u2019s inequality [21, Sec. 2.10].\nTheorem 1. Under the setup of Section 2 with q = o(1) and an arbitrary non-adaptive test design,\nin order to achieve Pe \u2192 0 as t \u2192 \u221e, it is necessary that\n\n(cid:18)\n\n(cid:19)\n\n1\nq\n\nt \u2265\n\nk log2\n\n(1 \u2212 \u03b7)\n\n(7)\n\n(cid:1)q log 1\n\nq =\n\n2\n\nfor arbitrarily small \u03b7 > 0.\n\nProof. The proof is based on the fact that the prior uncertainty (entropy) is roughly(cid:0)n\n\nk log2\n\n1\n\nq bits, whereas each test only reveals one bit of information. See Appendix B for details.\n\nUsing a similar analysis to [13], the preceding result can easily be strengthened to the strong converse,\nstating that Pe is not only bounded away from zero when t is below the threshold given, but tends to\none. On the other hand, the proof based on Fano\u2019s inequality extends more easily to noisy settings.\nExtending the result to adaptive test designs (e.g., again see [13]) is also straightforward, but in this\npaper we focus exclusively on non-adaptive designs.\n\n4 Algorithmic Upper Bounds\n4.1 COMP Algorithm\nAdopting the terminology from the group testing literature, the COMP algorithm is described in\nAlgorithm 1. The simple idea is that if two nodes appear in a negative test, then the corresponding\nedge must be absent from G. Hence, all such edges are ruled out, and the remaining edges are\ndeclared to be present. Once Lemma 1 is in place, the theoretical analysis of COMP becomes very\nsimple and similar to standard group testing [19], leading to the following.\nTheorem 2. Under the setup of Section 2 with q = \u0398(n2(\u03b8\u22121)) for some \u03b8 \u2208 (0, 1), and Bernoulli\ntesting with parameter \u03bd = 1, the COMP algorithm achieves Pe \u2192 0 as long as\n\nt \u2265(cid:0)2e \u00b7 k log n(cid:1)(1 + \u03b7)\n\n(8)\n\nfor arbitrarily small \u03b7 > 0.\nProof. The graph properties given in the de\ufb01nition (4) of T (\u0001n) facilitate a direct analysis of the\nprobability that the two nodes of a given non-edge fail to be included together in any negative test,\nand a union bound over all non-edges establishes the claim. See Appendix C for details.\n\n4.2 DD Algorithm\nSince we work on the assumption that edges are rare (i.e., q (cid:28) 1), one would expect that COMP\u2019s\napproach of assuming edges are present (unless immediately proven otherwise) can be highly\nsuboptimal. The DD algorithm,4 described in Algorithm 2, overcomes this limitation by assuming\nedges are absent unless immediately proven otherwise. The way to prove the presence of the edge\n\n4For the graph learning problem, one may prefer to name the algorithm De\ufb01nite Edges, but we prefer to\n\nmaintain consistency with the group testing literature [6].\n\n5\n\n\fAlgorithm 2 De\ufb01nite Defectives (DD)\nInput: Test designs {L(i)}t\n\n1: Initialize (cid:98)E = \u2205, and initialize PE to contain all(cid:0)n\n\ni=1, outcomes Y = (Y (1), . . . , Y (t))\n\n(cid:1) edges\n\n2\n\n2: for each i such that Y (i) = 0 do\n3:\n4: for each i such that Y (i) = 1 do\n5:\n\nRemove all edges from PE whose nodes are both in L(i)\n\nIf the nodes from L(i) cover exactly one edge in PE, add that edge to (cid:98)E\n\n6: return (cid:98)G = (V,(cid:98)E)\n\nAlgorithm 3 Smallest Satisfying Set (SSS)\nInput: Test designs {L(i)}t\n\n1: Find (cid:98)E that minimizes |(cid:98)E| subject to \u03c6(cid:98)E(L(i)) = Y (i) for all i = 1, . . . , t, where the function\n2: return (cid:98)G = (V,(cid:98)E)\n\n\u03c6E(L) = \u2228(i,j)\u2208E{{i, j} \u2286 L} corresponds to the observation model (1).\n\ni=1, outcomes Y = (Y (1), . . . , Y (t))\n\nis to use COMP to rule out non-edges, mark the remaining pairs as possible edges (PE), and then\nlook for positive tests containing only a single pair from PE. The analysis of DD is a fair bit more\nchallenging than COMP, but gives an improved bound, as stated in the following.\nTheorem 3. Under the setup of Section 2 with q = \u0398(n2(\u03b8\u22121)) for some \u03b8 \u2208 (0, 1), and Bernoulli\ntesting with parameter \u03bd = 1, the DD algorithm achieves Pe \u2192 0 as long as\n\nt \u2265(cid:0)2 max{\u03b8, 1 \u2212 \u03b8}e \u00b7 k log n(cid:1)(1 + \u03b7)\n\n(9)\n\nfor arbitrarily small \u03b7 > 0.\n\nProof. The proof is based on analyzing the two steps separately. In the \ufb01rst step, we show that with\nhigh probability not too many non-edges are included in PE, and in the second step, we show that\nconditioned on this success event in the \ufb01rst step, each true edge is the unique PE in some test with\nhigh probability. The details are given in Appendix D, and the main differences to the standard group\ntesting analysis [6, 40] are highlighted in Appendix I.\n\n5 SSS Algorithm Lower Bound\nIt is a standard result that under any random graph model, the optimal decoder (in the sense of\n\nminimizing Pe = P[(cid:98)G (cid:54)= G]) is the one that declares (cid:98)G to be the most probable graph that would\n\nhave produced the observation vector Y = (Y (1), . . . , Y (t)) if it were the true graph. Under the\nErd\u02ddos-R\u00e9nyi graph model, graphs with fewer edges are always more likely, so this decoder simply\nsearches for the graph with the fewest edges that is satisfying in the sense of being consistent with Y.\nThis leads to the SSS algorithm described in Algorithm 3. Similarly to [34], this algorithm amounts\nto an integer program, which may be hard to solve ef\ufb01ciently in general.\nDespite this computational challenge, a key utility of studying SSS is as follows. Since it is the\noptimal decoding algorithm, a lower bound on the number of tests it requires is also a lower bound\nfor any decoding algorithm. In the following theorem, we provide such a lower bound with respect to\nrandom Bernoulli test designs. While such a lower bound is, in a sense, weaker than that of Theorem\n1 (because that result holds for arbitrary test designs), it leads to the important conclusion that one\ncannot hope to improve on the bound for DD for \u03b8 > 1\n2 unless one moves beyond Bernoulli test\ndesigns. See Figure 1 for an illustration.\nTheorem 4. Under the setup of Section 2 with q = \u0398(n2(\u03b8\u22121)) for some \u03b8 \u2208 (0, 1), and Bernoulli\ntesting with an arbitrary choice of \u03bd > 0, the SSS algorithm yields Pe \u2192 1 whenever\n\nt \u2264(cid:0)2\u03b8e \u00b7 k log n(cid:1)(1 \u2212 \u03b7)\n\n(10)\n\nfor arbitrarily small \u03b7 > 0.\n\n6\n\n\fAlgorithm 4 Group Testing Quick and Ef\ufb01cient (GROTESQUE) \u2013 Informal Outline\nInput: Number of bundles B, inclusion probability r\n1: Form bundles B1, . . . ,BB by independently including each node in each Bb with probability r\n\n2: Initialize (cid:98)E = \u2205\n\n3: for each b = 1, . . . , B do\n4:\n5:\n6:\n\n7: return (cid:98)G = (V,(cid:98)E)\n\nPerform a multiplicity test (cf., Section 6.2) on Bb\nif multiplicity test returned \u201csingle edge\u201d then\n\nPerform location test (cf., Section 6.3) on Bb and add the resulting edge to (cid:98)E\n\nProof. The proof is based on the fact that if an edge is masked (i.e., its nodes never appear together\nin any test without those of a different edge), then removing that edge from E will produce a smaller\nsatisfying set, meaning that the algorithm fails to output E. The details are given in Appendix E, and\nthe main differences to the standard group testing analysis [6] are highlighted in Appendix I.\n\n6 Sublinear-Time Decoding\nA standard implementation of COMP or DD yields decoding complexity O(n2t), which may be\ninfeasible when n is large and decoding time is limited. To attain sublinear-time decoding, consider-\nably different algorithms are needed, as one certainly cannot rely on marking non-edges one by one.\nIn Algorithm 4, we informally outline a sublinear-time decoding algorithm that builds on the ideas of\nthe GROTESQUE algorithm for group testing [18]. We \ufb01nd it most convenient to formally describe\nthe key components while simultaneously performing the theoretical analysis; see Sections 6.2 and\n6.3. For the purpose of understanding the algorithm, it suf\ufb01ces to note the following:\n\n\u2022 A multiplicity test performs a number tmul of group tests in which the items from a given bundle\n. By counting the number of positive tests, one\n\u2022 A location test performs a sequence of carefully-designed tests that permit the identi\ufb01cation of\n\nare included independently with probability 1\u221a\ncan determine with high probability whether or not the bundle covers exactly one edge.\n\n2\n\nthe unique edge in a given bundle, provided that bundle indeed only covers one edge.\n\nThe resulting number of tests and runtime are given in the following theorem. Note that in contrast to\nthe previous sections, here our focus is on the scaling laws and not the implied constants. This is due\nto the fact that attaining sharp constant factors with sublinear-time decoding has remained an open\nchallenge even in the simpler group testing setting [15, 18, 29, 33].\nTheorem 5. Under the setup of Section 2 with q = \u0398(n2(\u03b8\u22121)) for some \u03b8 \u2208 (0, 1), the GROTESQUE\ntest design and decoding algorithm achieves Pe \u2192 0 with t = O(k \u00b7 log k \u00b7 log2 n) tests, and the\ndecoding time behaves as O(k log2 k + k log n) with probability approaching one.\n\nThe proof is given below after a short discussion. While it may seem unusual to have a decoding time\nsmaller than the number of tests, this is because the decoder is allowed to selectively decide which\ntests to make use of, and does not end up using them all. (We implicitly assume that fetching the\nresult of a given test can be done in constant time.) Comparing to Theorems 2 and 3, we see that the\nnumber of tests performed has increased by a log k \u00b7 log n factor. On the other hand, the decoding\ntime is nearly optimal: An analogous argument to Theorem 1 reveals an \u2126(k log n) lower bound,\nand the upper bound in Theorem 5 matches this result when log2 k = O(log n), and more generally\ncomes within at most a single logarithmic factor.\nWe brie\ufb02y mention that the encoding time (i.e., placing nodes in tests) is certainly not sublinear, so\nthe advantage of sublinear decoding time is most bene\ufb01cial when the encoding time does not pose a\nbottleneck (e.g., due to an ef\ufb01cient parallel implementation and/or pre-processing).\n\n6.1 Proof Step 1 \u2013 Bundles of Tests\nSince the number of edges k and maximal degree d behave as stated in (4) for some \u0001n = o(1)\nwith probability approaching one (see Lemma 1), it suf\ufb01ces to establish the claims of Theorem 5\nconditioned on an arbitrary graph G satisfying such properties. We implicitly condition on such a\ngraph throughout the analysis.\n\n7\n\n\fAs described in Algorithm 4, we form a number B of \u201cbundles\u201d of tests, where each node is placed\nin each bundle with probability r \u2208 (0, 1). In Appendix F, we use a direct probabilistic analysis to\n\nshow that under a choice satisfying B =(cid:0)4k log k(cid:1)(1 + o(1)), we have with probability 1 \u2212 o(1)\n\nthat every edge is the unique one in at least one bundle.\n\n2\n\n. For each such test:\n\n6.2 Proof Step 2 \u2013 Multiplicity Tests\nWe perform a multiplicity test on each bundle by performing a series of (group) tests in which every\nnode is independently included with probability 1\u221a\n\u2022 If there are no edges, the output is always 0;\n\u2022 If there is exactly one edge, each output equals 1 with probability 1\n2;\n\u2022 If there are multiple edges, each output equals 1 with probability strictly higher than 1\n\n2. To see\nthis, \ufb01rst observe that if there are two edges e1, e2 among 3 nodes, then the probability of a\npositive test is P[e1 \u222a e2] = P[e1] + P[e2] \u2212 P[e1 \u222a e2] = 1\n> 0.646, whereas if\nthere are two disjoint edges e1, e2 then a similar calculation yields P[e1 \u222a e2] \u2265 3\n4. Hence, the\noverall probability of a positive test is at least 0.646.\n\nof 1\u2019s lies in(cid:0)0, 0.573(cid:1). Trivially, if the number of edges is zero, we never make a mistake. On the\n\nBased on these observations, we declare each bundle to have a single edge if and only if the proportion\n\n2 \u2212 1\n\u221a\n\n2 + 1\n\n2\n\n2\n\nother hand, if the number of edges is one or more than one, we can apply Hoeffding\u2019s inequality with\na margin of at least 0.073; hence, using tmul tests we have P[misclassi\ufb01cation] \u2264 2e\u22122tmul\u00b70.0732.\nTaking the union bound over the B bundles, and noting that 2 \u00d7 0.0732 > 0.01, we \ufb01nd that we can\nclassify all of the bundles correctly with probability approaching one when tmul = (100 log B)(1 +\no(1)), so that the total number of tests used is tmulB = (100B log B)(1 + o(1)).\n\n6.3 Proof Step 3 \u2013 Location Tests\nFor location testing, we assign each node a unique binary string of length L = (cid:100)log2 n(cid:101). In the\nfollowing, we consider an arbitrary bundle containing a single edge. For ease of presentation, we\ndescribe the location test as though the algorithm could adaptively perform tests, and then we describe\nhow the same can be done non-adaptively.\nAdaptive location test. The following procedure constructs two binary strings A and B of length\nL; these strings will index the two nodes in the bundle that have an edge between them. For each\n(cid:96) = 1, . . . , L, we do the following:\n\n1. Test all nodes with a 0 in their (cid:96)-th bit. If the test is positive, label both A(cid:96) and B(cid:96) as zero.\n2. Test all nodes with a 1 in their (cid:96)-th bit. If the test is positive, label both A(cid:96) and B(cid:96) as one.\n3. If neither of the preceding tests is positive, we know that A(cid:96) (cid:54)= B(cid:96), so we do the following:\n\n(a) If this is the \ufb01rst (cid:96) for which this case is encountered, then assign A(cid:96) = 0 and B(cid:96) = 1 (the\n(b) Otherwise, let (cid:96)(cid:48) < (cid:96) be an index where we encountered this case, and do the following:\n\nother way around would be equally valid).\n\ni. Let v \u2208 {0, 1} be the bit value that was assigned to A(cid:96)(cid:48).\nii. Perform a test containing all nodes whose bit strings (v1, . . . , vL) have v(cid:96) = v(cid:96)(cid:48) = v\niii. If the test is positive, then assign A(cid:96) = v and B(cid:96) = 1 \u2212 v. Otherwise, assign\n\nand also all those that have v(cid:96) = v(cid:96)(cid:48) = 1 \u2212 v.\nA(cid:96) = 1 \u2212 v and B(cid:96) = v.\n\nThe idea of step 3(b)(ii) is that we already know that the edges corresponding to A and B must have\nv and 1\u2212 v in bit position (cid:96)(cid:48) respectively, so we perform the test described to check whether the same\nis true of position (cid:96). If it is not true, then with A and B differing in their (cid:96)-th bit, the only remaining\ncase is that A and B have 1 \u2212 v and v in bit position (cid:96) respectively.\nNon-adaptive location test. The only types of tests that the adaptive algorithm above uses are (i)\nthose used in Steps 1 and 2 with all nodes having a given (cid:96)-th bit value; and (ii) those used in Step 3b\ncontaining all nodes with some v(cid:96) = v(cid:96)(cid:48) = v and some other v(cid:96) = v(cid:96)(cid:48) = 1 \u2212 v. There are only 2L\n\n(cid:1) possible tests of type (ii). Hence, we can perform the tests\n\npossible such tests of type (i), and 2(cid:0)L\n\nnon-adaptively by taking all such possible tests in advance. Moreover, we don\u2019t have to look at all\ntheir outcomes, but rather only those that we would have taken in the adaptive setting.\n\n2\n\n8\n\n\fy\nt\ni\nl\ni\nb\na\nb\no\nr\np\n\ns\ns\ne\nc\nc\nu\nS\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nSSS\nLP\nDD\nCOMP\n\n50 100 150 200 250 300 350 400 450\n\nNumber of tests\n\ny\nt\ni\nl\ni\nb\na\nb\no\nr\np\n\ns\ns\ne\nc\nc\nu\nS\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nLP\nDD\nCOMP\n\n4,000\n\n6,000\n\n8,000\n\n10,000\n\nNumber of tests\n\nFigure 2: Performance of the COMP, DD, LP, and SSS algorithms for noiseless group testing under\nBernoulli testing with \u03bd = 1, and with n = 50 and k = 5 (Left); n = 200 and k = 200 (Right).\n2(cid:100)log2 n(cid:101)2 = (log2 n)2(1+\nHence, the number of group tests per location test is at most 2(cid:100)log2 n(cid:101)+2\u00b7 1\no(1)), so if we perform one location test for each bundle (and again only actually use those that we\nneed) then the total is B(log2 n)2(1 + o(1)).\n6.4 Proof Step 4 \u2013 Total Number of Tests and Decoding Time\nThe claims on the number of tests and runtime stated in Theorem 5 follow easily from the above\nanalysis, and the details are deferred to Appendix F.\n7 Numerical Experiments\nWe complement our theoretical \ufb01ndings with numerical experiments comparing COMP, DD, SSS,\nand a linear programming (LP) relaxation of SSS (analogous to [34]).5 Figure 2 shows the success\nprobability as a function of the number of tests in two cases: (i) n = 50 and k = 5; (ii) n = 200 and\nk = 200. In each case, we set \u03bd = 1 and compute the error probability averaged over 2000 trials. In\nthe \ufb01rst case, we observe that the SSS and LP curves are very close, and require the fewest tests;\nDD requires more tests, and COMP requires the most. In the second case, we omit SSS due to its\ncomputational complexity, but we observe a similar ordering between LP, DD, and COMP. In both\ncases, the relative performance between the algorithms is consistent with our theoretical \ufb01ndings:\n\u2022 The \ufb01rst case is a sparse setting, and the performance curves for COMP and DD are relatively\ncloser, which is consistent with the fact that COMP and DD achieve the same theoretical bound\nin the sparse limit \u03b8 \u2192 0 (see Figure 1).\n\u2022 The second case is a denser setting, and the gap between LP and DD is narrower, which is\nconsistent with the fact that the theoretical bounds for DD and SSS coincide in denser regimes.\n\nIn Appendix G, we provide similar plots for varying choices of (k, n) in order to demonstrate that\nthe dependence of the number of tests on k and n is in general agreement with our theory.\n8 Conclusion\nWe have studied the problem of learning Erd\u02ddos-R\u00e9nyi random graphs via edge detecting queries,\nand demonstrated signi\ufb01cantly improved scaling of O(k log n) compared to worst-case graphs with\nk edges. We provided order-optimal bounds for the COMP, DD, and SSS algorithms with explicit\nconstants, showed DD to be optimal under Bernoulli testing when the graph is suf\ufb01ciently dense\n(\u03b8 \u2265 1\n2), and introduced a sublinear-time algorithm that succeeds with O(k log2 k + k log n) runtime.\nGiven that the ideas of this paper build on a variety of techniques for small-error group testing, it is\nnatural to pursue further research in directions that were done previously in that setting, including\nseparate decoding of items (edges) [35, 38], information-theoretic achievability bounds [20, 37], and\nnear-constant tests-per-item (tests-per-node) designs [20, 31]. Generalizations of our techniques to\nhypergraph learning [2, 11, 12, 24] would also be of signi\ufb01cant interest.\nAcknowledgment. This work was supported by an NUS Early Career Research Award.\n\n5The code is available at https://github.com/scarlett-nus/er_edge_det.\n\n9\n\n\fReferences\n\n[1] H. Abasi and N. H. Bshouty, \u201cOn learning graphs with edge-detecting queries,\u201d 2018,\n\nhttps://arxiv.org/abs/1803.10639.\n\n[2] H. Abasi, N. H. Bshouty, and H. Mazzawi, \u201cNon-adaptive learning of a hidden hypergraph,\u201d\n\nTheoretical Comp. Sci., vol. 716, pp. 15\u201327, 2018.\n\n[3] M. Aigner, \u201cSearch problems on graphs,\u201d Disc. Appl. Math., vol. 14, no. 3, pp. 215\u2013230, 1986.\n[4] M. Aigner and E. Triesch, \u201cSearching for an edge in a graph,\u201d Journal of Graph Theory, vol. 12,\n\nno. 1, pp. 45\u201357, 1988.\n\n[5] M. Aldridge, \u201cIndividual testing is optimal for nonadaptive group testing in the linear regime,\u201d\n\nIEEE Trans. Inf. Theory, vol. 65, no. 4, pp. 2058\u20132061, April 2019.\n\n[6] M. Aldridge, L. Baldassini, and O. Johnson, \u201cGroup testing algorithms: Bounds and simulations,\u201d\n\nIEEE Trans. Inf. Theory, vol. 60, no. 6, pp. 3671\u20133687, June 2014.\n\n[7] M. Aldridge, \u201cThe capacity of Bernoulli nonadaptive group testing,\u201d IEEE Trans. Inf. Theory,\n\nvol. 63, no. 11, pp. 7142\u20137148, 2017.\n\n[8] M. Aldridge, O. Johnson, and J. Scarlett, \u201cGroup testing: An information theory perspective,\u201d\n\n2019, https://arxiv.org/abs/1902.06002.\n\n[9] N. Alon and V. Asodi, \u201cLearning a hidden subgraph,\u201d SIAM J. Disc. Math., vol. 18, no. 4, pp.\n\n697\u2013712, 2005.\n\n[10] N. Alon, R. Beigel, S. Kasif, S. Rudich, and B. Sudakov, \u201cLearning a hidden matching,\u201d SIAM\n\nJ. Comp., vol. 33, no. 2, pp. 487\u2013501, 2004.\n\n[11] D. Angluin and J. Chen, \u201cLearning a hidden hypergraph,\u201d J. Mach. Learn. Res., vol. 7, no. Oct.,\n\npp. 2215\u20132236, 2006.\n\n[12] D. Angluin and J. Chen, \u201cLearning a hidden graph using o(log n) queries per edge,\u201d J. Comp.\n\nSys. Sci., vol. 74, no. 4, pp. 546\u2013556, 2008.\n\n[13] L. Baldassini, O. Johnson, and M. Aldridge, \u201cThe capacity of adaptive group testing,\u201d in IEEE\n\nInt. Symp. Inf. Theory, July 2013, pp. 2676\u20132680.\n\n[14] B. Bollob\u00e1s and B. B\u00e9la, Random graphs. Cambridge University Press, 2001, no. 73.\n[15] S. Bondorf, B. Chen, J. Scarlett, H. Yu, and Y. Zhao, \u201cSublinear-time non-adaptive group testing\n\nwith O(k log n) tests via bit-mixing coding,\u201d 2019, https://arxiv.org/abs/1904.10102.\n\n[16] M. Bouvel, V. Grebinski, and G. Kucherov, \u201cCombinatorial search on graphs motivated by\nbioinformatics applications: A brief survey,\u201d in Int. Workshop Graph-Theoretic Concepts in\nComp. Sci. Springer, 2005, pp. 16\u201327.\n\n[17] N. H. Bshouty, \u201cLinear time constructions of some d-restriction problems,\u201d in Int. Conf. Algs.\n\nand Complexity. Springer, 2015, pp. 74\u201388.\n\n[18] S. Cai, M. Jahangoshahi, M. Bakshi, and S. Jaggi, \u201cEf\ufb01cient algorithms for noisy group testing,\u201d\n\nIEEE Trans. Inf. Theory, vol. 63, no. 4, pp. 2113\u20132136, 2017.\n\n[19] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama, \u201cNon-adaptive probabilistic group testing\nwith noisy measurements: Near-optimal bounds with ef\ufb01cient algorithms,\u201d in Allerton Conf.\nComm., Ctrl., Comp., Sep. 2011, pp. 1832\u20131839.\n\n[20] A. Coja-Oghlan, O. Gebhard, M. Hahn-Klimroth, and P. Loick, \u201cInformation-theoretic and\nalgorithmic thresholds for group testing,\u201d in Int. Colloq. Aut., Lang. and Prog. (ICALP), 2019.\nJohn Wiley & Sons, Inc.,\n\n[21] T. M. Cover and J. A. Thomas, Elements of Information Theory.\n\n2006.\n\n[22] D. de Caen, \u201cA lower bound on the probability of a union,\u201d Discrete mathematics, vol. 169,\n\nno. 1, pp. 217\u2013220, 1997.\n\n[23] D. Du and F. K. Hwang, Combinatorial group testing and its applications. World Scienti\ufb01c,\n\n2000, vol. 12.\n\n[24] A. G. D\u2019yachkov, I. V. Vorobyev, N. Polyanskii, and V. Y. Shchukin, \u201cOn multistage learning a\n\nhidden hypergraph,\u201d in IEEE Int. Symp. Inf. Theory (ISIT), 2016.\n\n10\n\n\f[25] J. Galambos and I. Simonelli, Bonferroni-type inequalities with applications. Springer Verlag,\n\n1996.\n\n[26] E. N. Gilbert, \u201cRandom graphs,\u201d Ann. Math. Stats., vol. 30, no. 4, pp. 1141\u20131144, 12 1959.\n[27] V. Grebinski and G. Kucherov, \u201cReconstructing a Hamiltonian cycle by querying the graph:\nApplication to DNA physical mapping,\u201d Disc. Appl. Math., vol. 88, no. 1-3, pp. 147\u2013165, 1998.\n[28] F. K. Hwang and D. Du, Pooling designs and nonadaptive group testing: Important tools for\n\nDNA sequencing. World Scienti\ufb01c, 2006, vol. 18.\n\n[29] H. A. Inan, P. Kairouz, M. Wootters, and A. Ozgur, \u201cOn the optimality of the Kautz-Singleton\n\nconstruction in probabilistic group testing,\u201d 2019, IEEE Trans. Inf. Theory (to appear).\n\n[30] P. Johann, \u201cA group testing problem for graphs with several defective edges,\u201d Discrete Applied\n\nMathematics, vol. 117, no. 1-3, pp. 99\u2013108, 2002.\n\n[31] O. Johnson, M. Aldridge, and J. Scarlett, \u201cPerformance of group testing algorithms with\n\nnear-constant -item,\u201d IEEE Trans. Inf. Theory, vol. 65, no. 2, pp. 707\u2013723, Feb. 2019.\n\n[32] H. Kameli, \u201cNon-adaptive group testing on graphs,\u201d Disc. Math. and Theor. Comp. Sci., vol. 20,\n\nno. 1, 2018.\n\n[33] K. Lee, R. Pedarsani, and K. Ramchandran, \u201cSAFFRON: A fast, ef\ufb01cient, and robust framework\n\nfor group testing based on sparse-graph codes,\u201d 2015, http://arxiv.org/abs/1508.04485.\n\n[34] D. Malioutov and M. Malyutov, \u201cBoolean compressed sensing: LP relaxation for group testing,\u201d\n\nin IEEE Int. Conf. Acoust. Sp. Sig. Proc. (ICASSP), March 2012, pp. 3305\u20133308.\n\n[35] M. B. Malyutov and P. S. Mateev, \u201cScreening designs for non-symmetric response function,\u201d\n\nMat. Zametki, vol. 29, pp. 109\u2013127, 1980.\n\n[36] R. Motwani and P. Raghavan, Randomized Algorithms. Chapman & Hall/CRC, 2010.\n[37] J. Scarlett and V. Cevher, \u201cPhase transitions in group testing,\u201d in Proc. ACM-SIAM Symp. Disc.\n\nAlg. (SODA), 2016.\n\n[38] J. Scarlett and V. Cevher, \u201cNear-optimal noisy group testing via separate decoding of items,\u201d\n\nIEEE Trans. Sel. Topics Sig. Proc., vol. 2, no. 4, pp. 625\u2013638, 2018.\n\n[39] J. Scarlett and V. Cevher, \u201cAn introductory guide to Fano\u2019s inequality with applications in\n\nstatistical estimation,\u201d 2019, https://arxiv.org/abs/1901.00555.\n\n[40] J. Scarlett and O. Johnson, \u201cNoisy non-adaptive group testing: A (near-)de\ufb01nite defectives\n\napproach,\u201d 2018, https://arxiv.org/abs/1808.09143.\n\n[41] K. Shanmugam, R. Tandon, A. Dimakis, and P. Ravikumar, \u201cOn the information theoretic limits\n\nof learning Ising models,\u201d in Adv. Neur. Inf. Proc. Sys. (NIPS), 2014.\n\n11\n\n\f", "award": [], "sourceid": 195, "authors": [{"given_name": "Zihan", "family_name": "Li", "institution": "National University of Singapore"}, {"given_name": "Matthias", "family_name": "Fresacher", "institution": "University of Adelaide"}, {"given_name": "Jonathan", "family_name": "Scarlett", "institution": "National University of Singapore"}]}