{"title": "Bounding the Cost of Search-Based Lifted Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 946, "page_last": 954, "abstract": "Recently, there has been growing interest in systematic search-based and importance sampling-based lifted inference algorithms for statistical relational models (SRMs). These lifted algorithms achieve significant complexity reductions over their propositional counterparts by using lifting rules that leverage symmetries in the relational representation. One drawback of these algorithms is that they use an inference-blind representation of the search space, which makes it difficult to efficiently pre-compute tight upper bounds on the exact cost of inference without running the algorithm to completion. In this paper, we present a principled approach to address this problem. We introduce a lifted analogue of the propositional And/Or search space framework, which we call a lifted And/Or schematic. Given a schematic-based representation of an SRM, we show how to efficiently compute a tight upper bound on the time and space cost of exact inference from a current assignment and the remaining schematic. We show how our bounding method can be used within a lifted importance sampling algorithm, in order to perform effective Rao-Blackwellisation, and demonstrate experimentally that the Rao-Blackwellised version of the algorithm yields more accurate estimates on several real-world datasets.", "full_text": "Bounding the Cost of Search-Based Lifted Inference\n\nDavid Smith\n\nUniversity of Texas At Dallas\n\nVibhav Gogate\n\nUniversity of Texas At Dallas\n\n800 W Campbell Rd, Richardson, TX 75080\n\n800 W Campbell Rd, Richardson, TX 75080\n\ndbs014200@utdallas.edu\n\nvibhav.gogate@utdallas.edu\n\nAbstract\n\nRecently, there has been growing interest in systematic search-based and impor-\ntance sampling-based lifted inference algorithms for statistical relational models\n(SRMs). These lifted algorithms achieve signi\ufb01cant complexity reductions over\ntheir propositional counterparts by using lifting rules that leverage symmetries in\nthe relational representation. One drawback of these algorithms is that they use\nan inference-blind representation of the search space, which makes it dif\ufb01cult to\nef\ufb01ciently pre-compute tight upper bounds on the exact cost of inference with-\nout running the algorithm to completion. In this paper, we present a principled\napproach to address this problem. We introduce a lifted analogue of the proposi-\ntional And/Or search space framework, which we call a lifted And/Or schematic.\nGiven a schematic-based representation of an SRM, we show how to ef\ufb01ciently\ncompute a tight upper bound on the time and space cost of exact inference from\na current assignment and the remaining schematic. We show how our bounding\nmethod can be used within a lifted importance sampling algorithm, in order to\nperform effective Rao-Blackwellisation, and demonstrate experimentally that the\nRao-Blackwellised version of the algorithm yields more accurate estimates on\nseveral real-world datasets.\n\n1\n\nIntroduction\n\nA myriad of probabilistic logic languages have been proposed in recent years [5, 12, 17]. These\nlanguages can express elaborate models with a compact speci\ufb01cation. Unfortunately, performing\nef\ufb01cient inference in these models remains a challenge. Researchers have attacked this problem\nby \u201clifting\u201d propositional inference techniques; lifted algorithms identify indistinguishable random\nvariables and treat them as a single block at inference time, which can yield signi\ufb01cant reductions\nin complexity. Since the original proposal by Poole [15], a variety of lifted inference algorithms\nhave emerged. One promising approach is the class of search-based algorithms [8, 9, 16, 19, 20, 21],\nwhich lift propositional weighted model counting [4, 18] to the \ufb01rst-order level by transforming the\npropositional search space into a smaller lifted search space.\nIn general, exact lifted inference remains intractable. As a result, there has been a growing interest\nin developing approximate algorithms that take advantage of symmetries. In this paper, we focus\non a class of such algorithms, called lifted sampling methods [9, 10, 13, 14, 22] and in particular on\nthe lifted importance sampling (LIS) algorithm [10]. LIS can be understood as a sampling analogue\nof an exact lifted search algorithm called probabilistic theorem proving (PTP). PTP accepts a SRM\nas input (as a Markov Logic Network (MLN) [17]), decides upon a lifted inference rule to apply\n(conditioning, decomposition, partial grounding, etc.), constructs a set of reduced MLNs, recursively\ncalls itself on each reduced MLN in this set, and combines the returned values in an appropriate\nmanner. A drawback of PTP is that the MLN representation of the search space is inference unaware;\nat any step in PTP, the cost of inference over the remaining model is unknown. This is problematic\nbecause unlike (propositional) importance sampling algorithms for graphical models, which can\nbe Rao-Blackwellised [3] in a principled manner by sampling variables until the treewidth of the\nremaining model is bounded by a small constant (called w-cutset sampling [1]), it is currently not\npossible to Rao-Blackwellise LIS in a principled manner. To address these limitations, we make the\nfollowing contributions:\n\n1\n\n\f1. We propose an alternate, inference-aware representation of the lifted search space that allows\nef\ufb01cient computation of the cost of inference at any step of the PTP algorithm. Our approach\nis based on the And/Or search space perspective [6]. Propositional And/Or search associates a\ncompact representation of a search space with a graphical model (called a pseudotree), and then\nuses this representation to guide a weighted model counting algorithm over the full search space.\nWe extend this notion to Lifted And/Or search spaces. We associate with each SRM a schematic,\nwhich describes the associated lifted search space in terms of lifted Or nodes, which represent\nbranching on counting assignments [8] to groups of indistinguishable variables, and lifted And\nnodes, which represent decompositions over independent and (possibly) identical subproblems.\nOur formal speci\ufb01cation of lifted And/Or search spaces offers an intermediate representation of\nSRMs that bridges the gap between high-level probabilistic logics such as Markov Logic [17] and\nthe search space representation that must be explored at inference time.\n\n2. We use the intermediate speci\ufb01cation to characterize the size of the search space associated with\nan SRM without actually exploring it, providing tight upper bounds on the complexity of PTP.\nThis allows us, in principle, to develop advanced approximate lifted inference algorithms that\ntake advantage of exact lifted inference whenever they encounter tractable subproblems.\n\n3. We demonstrate the utility of our lifted And/Or schematic and tight upper bounds by developing\na Rao-Blackwellised lifted importance sampling algorithm, enabling the user to systematically\nexplore the accuracy versus complexity trade-off. We demonstrate experimentally that it vastly\nimproves the accuracy of estimation on several real-world datasets.\n\n2 Background and Terminology\n\nAnd/Or Search Spaces. The And/Or search space model is a general perspective for searching over\ngraphical models, including both probabilistic networks and constraint networks [6]. And/Or search\nspaces allow for many familiar graph notions to be used to characterize algorithmic complexity.\nGiven a graphical model, M \u201c xG, \u03a6y, where G \u201c xV, Ey is a graph and \u03a6 is a set of features or\npotentials, and a rooted tree T that spans G in such a manner that the edges of G that are not in T\nare all back-edges (i.e., T is a pseudo tree [6]), the corresponding And/Or Search Space, denoted\nSTpRq, contains alternating levels of And nodes and Or nodes. Or nodes are labeled with Xi, where\nXi P varsp\u03a6q. And nodes are labeled with xi and correspond to assignments to Xi. The root of the\nAnd/Or search tree is an Or node corresponding to the root of T .\nIntuitively, the pseudo tree can be viewed as a schematic for the structure of an And/Or search space\nassociated with a graphical model, which denotes (1) the conditioning order on the set varsp\u03a6q, and\n(2) the locations along this ordering at which the model decomposes into independent subproblems.\nGiven a pseudotree, we can generate the corresponding And/Or search tree via a straightforward\nalgorithm [6] that adds conditioning branches to the pseudo tree representation during a DFS walk\nover the structure. Adding a cache that stores the value of each subproblem (keyed by an assignment\nto its context) allows each subproblem to be computed just once, and converts the search tree into\na search graph. Thus the cost of inference is encoded in the pseudo tree. In Section 3, we de\ufb01ne a\nlifted analogue to the backbone pseudo tree, called a lifted And/Or schematic, and in Section 3, we\nuse the de\ufb01nition to prove cost of inference bounds for probabilistic logic models.\nFirst Order Logic. An entity (or a constant) is an object in the model about which we would like to\nreason. Each entity has an associated type, \u03c4. The set of all unique types forms the set of base types\nfor the model. A domain is a set of entities of the same type \u03c4; we assume that each domain is \ufb01nite\nand is disjoint from every other domain in the model. A variable, denoted by a lower-case letter, is a\nsymbolic placeholder that speci\ufb01es where a substitution may take place. Each variable is associated\nwith a type \u03c4; a valid substitution requires that a variable be replaced by an object (either an entity or\nanother variable) with the same type. We denote the domain associated with a variable v by \u2206v.\nWe de\ufb01ne a predicate, denoted by Rpt1 :: \u03c41, . . . , tk :: \u03c4kq, to be a k-ary functor that maps typed\nentities to binary-valued random variables (also called parameterized random variable [15]). A\nsubstitution is an expression of the form tt1 \u201c x1, . . . , tk \u201c xku where ti are variables of type \u03c4i\nand xi are either entities or variables of type \u03c4i. Given a predicate R and a substitution \u03b8 \u201c tt1 \u201c\nx1, . . . , tk \u201c xku, the application of \u03b8 to R yields another k-ary functor functor with each ti replaced\nby xi, called an atom. If all the xi are entities, the application yields a random variable. In this case,\nwe refer to \u03b8 as a grounding of R, and R\u03b8 as a ground atom. We adopt the notation \u03b8i to refer to the\ni-th assignment of \u03b8, i.e. \u03b8i \u201c xi.\n\n2\n\n\fZ expp\n\n\u0159\n\u0159\nx expp\n\nStatistical Relational Models combine \ufb01rst-order logic and probabilistic graphical models. A\npopular SRM is Markov logic networks (MLNs) [17]. An MLN is a set of weighted \ufb01rst-order logic\nclauses. Given entities, the MLN de\ufb01nes a Markov network over all the ground atoms in its Herbrand\n\u0159\nbase (cf. [7]), with a feature corresponding to each ground clause in the Herbrand base. (We assume\nHerbrand interpretations throughout this paper.) The weight of each feature is the weight of the\ncorresponding \ufb01rst-order clause. The probability distribution associated with the Markov network\nis given by: Ppxq \u201c 1\ni winipxqq where wi is the weight of the ith clause and nipxq is its\nnumber of true groundings in x, and Z \u201c\ni winipxqq is the partition function. In this\npaper, we focus on computing Z. It is known that many inference problems over MLNs can be\nreduced to computing Z.\nProbabilistic Theorem Proving (PTP) [9] is an algorithm for computing Z in MLNs. It lifts the\ntwo main steps in propositional inference: conditioning (Or nodes) and decomposition (And nodes).\nIn lifted conditioning, the set of truth assignments to ground atoms of a predicate R are partitioned\ninto multiple parts such that in each part (1) all truth assignment have the same number of true atoms\nand (2) the MLNs obtained by applying the truth assignments are identical. Thus, if R has n ground\natoms, the lifted search procedure will search over Opn ` 1q new MLNs while the propositional\nsearch procedure will search over Op2nq MLNs, an exponential reduction in complexity. In lifted\ndecomposition, the MLN is partitioned into a set of MLNs that are not only identical (up to a\nrenaming) but also disjoint in the sense that they do not share any ground atoms. Thus, unlike the\npropositional procedure which creates n disjoint MLNs and searches over each, the lifted procedure\nsearches over just one of the n MLNs (since they are identical). Unfortunately, lifted decomposition\nand lifted conditioning cannot always be applied and in such cases PTP resorts to propositional\nconditioning and decomposition. A drawback of PTP is that unlike propositional And/Or search\nwhich has tight complexity guarantees (e.g., exponential in the treewidth and pseudotree height),\nthere are no (tight) formal guarantees on the complexity of PTP.1 We address this limitation in the\nnext two sections.\n3 Lifted And/Or Schematics\nOur goal in this section is to de\ufb01ne a lifted\nanalogue the pseudotree notion employed\nby the propositional And/Or framework.\nThe structure must encode (1) all infor-\nmation contained in a propositional pseu-\ndotree (a conditioning order, conditional\nindependence assumptions), as well as (2)\nadditional information needed by the PTP\nalgorithm in order to exploit the symme-\ntries of the lifted model. Since the symme-\ntries that can be exploited highly depend\non the amount of evidence, we encode the\nSRM after evidence is instantiated, via a process called shattering [2]. Thus, while a pseudotree\nencodes a graphical model, a schematic encodes an (SRM, evidence set) pair.\nDe\ufb01nition A lifted Or node is a vertex labeled by a 6 \u00b4 tuple xR, \u0398, \u03b1, i, c, ty, where (1) R is a\nk-ary predicate, (2) \u0398 is a set of valid substitutions for R, (3) \u03b1 P t1, . . . , ku, represents the counting\nargument for the predicate Rpt1 :: \u03c41, . . . , tk :: \u03c4kq and speci\ufb01es a domain \u03c4\u03b1 to be counted over, (4)\ni is an identi\ufb01er of the block of the partition being counted over, (5) c P Z` is the number of entities\nin block i, and (6) t P tT rue, F alse, U nknownu is the truth value of the set of entities in block i.\nDe\ufb01nition A lifted And node is a vertex labeled by F , a (possibly empty) set of formulas, where\na formula f is a pair ptpO, \u03b8, bqu, wq, in which O is a lifted Or node xR, \u0398, \u03b1, i, c, ty, \u03b8 P \u0398 ,\nb P tT rue, F alseu, and w P R. Formulas are assumed to be in clausal form.\nDe\ufb01nition A lifted And/Or schematic, S \u201c xVS, ES, vry, is a rooted tree comprised of lifted Or\nnodes and lifted And nodes. S must obey the following properties:\n\u2022 Every lifted Or node O P VS has a single child node N P VS.\n\u2022 Every lifted And node A P VS has a (possibly empty) set of children tN1, . . . , Nnu \u0102 VS .\n\nFigure 1: Possible schematics for (a) Rpxq _ Spxq, (b) Rpxq\n_Spx, yq and (c) Rpxq _ Rpyq _ Spx, yq, \u2206x \u201c \u2206y \u201c 2.\nU N stands for unknown. Circles and diamonds represent\n\nlifted Or and And nodes respectively.\n\nS1([x,y],2,2,UN)\n\nS1([x,y],2,2,UN)\n\nR1([x],1,2,UN)\n\nR1([x],1,2,UN)\n\nR1([x],1,2,UN)\n\n(x,1,2)\n\n(y,1,2)\n\n(x,1,2)\n\nS1([x],1,2,UN)\n\n1Although, complexity bounds exist for related inference algorithms such as \ufb01rst-order decomposition trees\n\n[20], they are not as tight as the ones presented in this paper.\n\n3\n\n\f\u2022 For each pair of lifted Or nodes O, O1 P VS, with respective labels xR, \u0398, \u03b1, i, c, ty,\nxR1, \u03981, \u03b11, i1, c1, t1y, pR, iq \u2030 pR1, i1q. Pairs pR, iq uniquely identify lifted Or nodes.\n\u2022 For every lifted Or node O P VS with label xR, \u0398, \u03b1, i, c, ty, @\u03b8 P \u0398,@\u03b11 \u2030 \u03b1, either (1) \u2206\u03b8\u03b11 =\n1, or (2) \u03b8\u03b11 P X, where X has appeared as the decomposer label [9] of some edge in pathSpO, vrq.\n\u2022 For each formula fi \u201c ptpO, \u03b8, bqu, wq appearing at a lifted And node A, @O P tpO, \u03b8, bqu,\nO P pathSpvr, Aq. We call the set of edges tpO, Aq | O P FormulaspAqu the back edges of S.\n\u2022 Each edge between a lifted Or node O and its child node N is unlabeled. Each edge between\na lifted And node A and its child node N may be (1) unlabeled or (2) labeled with a pair pX, cq,\nwhere X is a set of variables, called a decomposer set, and c P Z` is the the number of equivalent\nentities in the block of x represented by the subtree below. If it is labeled with a decomposer set X\nthen (a) for every substitution set \u0398 labeling a lifted Or node O1 appearing in the subtree rooted at\nN, Di s.t .@\u03b8 P \u0398, \u03b8i P X and (b) @ decomposer sets Y labeling edges in the subtree rooted at N,\nY X X \u201c H.\nThe lifted And/Or Schematic is a general structure for specifying the inference procedure in SRMs.\nIt can encode models speci\ufb01ed in many formats, such as Markov Logic [17] and PRV models [15].\nGiven a model and evidence set, constructing a schematic conversion into a canonical form is achieved\nvia shattering [2, 11], whereby exchangeable variables are grouped together. Inference only requires\ninformation on the size of these groups, so the representation omits information on the speci\ufb01c\nvariables in a given group. Figure 1 shows And/Or schematics for three MLNs.\n\n\u00b4\u00b4\u015b\n\n`\n\n\u02d8\u00af\n\nn\ni\u201c1\n\nki\nvi\n\nelse\n\nw \u201c w\u02c6evalNodepN1, csq\n\n//give v its own entry in cs\ncs3 \u201c updateCCAtDecomposerpcs2, V, v, pai, 1qq\nw \u201c w\u02c6evalNodepN1, cs3qki\n\ncs2 \u201c cs1 Y xpV, bq, xtu, tptu, cbqyy\n\nif ExpV, bq, ccy P cs s.t. v P V then\nxP, My \u201cgetCC(V, b, cs2) //get cc for V\nfor assignment pai, kiq P M do\n\nw \u201c w\u02c6 calculateWeightpf, csq\ncs1 \u201c sumOutDoneAtomspcs, Nq\nif pN, N1q has label xV, b, cby then\n\nAlgorithm 2 Function evalNode(Or)\n1: Input: a schematic, T with Or Node root, a counting store cs\n2: Output: a real number, w\n3: if pxroot(T),cs)y, wq P cache then return w\n4: xR, \u0398, \u03b1, b, c, t, Py = root(T )\n5: T 1 \u201c child(xR, \u0398, \u03b1, b, c, t, y, Tq\n6: V \u201c tv | \u03b8 P \u0398, \u03b8\u03b1 \u201c vu\n7: xP, txai, kiyuy \u201cgetCC(V, b)\n8: w \u201c 0\n9: if t P tT rue, F alseu then\ncs1 = updateCC(xP, My, R, tv)\n10:\nw \u201cevalNode(T 1,cs1)\n11:\n12: else\nassigns = ttv1, . . . , vnu | vi P t0, . . . , kiuu\n13:\n\u00af\nfor tv1, . . . , vnu P assigns do\n14:\ncs1 = updateCC(xP, My, R, tv1, . . . , vnu)\n15:\nw \u201c w `\nevalNodepT 1, cs1q\n16:\n17: insertCache(xR, \u0398, \u03b1, b, c, t, Py, w)\n18: return w\n\nAlgorithm 1 Function evalNode(And)\n1: Input: a schematic, T with And root node, a counting store cs\n2: Output: a real number, w\n3: N \u201c root(T )\n4: for formula f P N do\n5:\n6: for child N1 of T do\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17:\n18: return w\n3.1 Lifted Node Evaluation Functions-We describe the inference procedure in Algorithms 1 and 2.\nWe require the notion of a counting store in order to track counting assignments over the variables in\nthe model. A counting store is a set of pairs xpV, iq, ccy, where V is a set of variables that are counted\nover together, i is a block identi\ufb01er, and cc is a counting context. A counting context (introduced in\n[16]), is a pair xP r, My, where P r is a list of m predicates and M : tT rue, F alseum \u00d1 k, is a map\nfrom truth assignments to P r to a non-negative integer denoting the count of the number of entities\nin the i-th block of the partition of each v P V that take that assignment. We initialize the algorithm\nby a call to Algorithm 1 with an appropriate schematic S and empty counting store.\nThe lifted And node function (Algorithm 1) \ufb01rst computes the weight of any completely conditioned\nformulas. It then makes a set of evalNode calls for each of its children O; if pA, Oq has decomposer\nlabel V , it makes a call for each assignment in each block of the partition of V ; otherwise it makes a\nsingle call to O. The algorithm takes the product of the resulting terms along with the product of\nthe weights and returns the result. The lifted Or node function (Algorithm 2) retrieves the set of all\nassignments previously made to its counting argument variable set; it then makes an evalNode call to\nits child for each completion to its assignment set that is consistent with its labeled truth value, and\ntakes their weighted sum, where the weight is the number of symmetric assignments represented by\neach assignment completion.\nThe overall complexity of depends on the number of entries in the counting store at each step of\ninference. Note that Algorithm 1 reduces the size of the store by summing out over atoms that leave\ncontext. Algorithm 2 increases the size of the store at atoms with unknown truth value by splitting\nthe current assignment into True and False blocks w.r.t. its atom predicate. Atoms with known truth\nvalue leave the size of the store unchanged.\n\n4\n\n\f4 Complexity Analysis\nAlgorithms 1 and 2 describe a DFS-style traversal of the lifted search space associated with S. As our\nnotion of complexity, we are interested in specifying the maximum number of times any node VS P S\nis replicated during instantiation of the search space. We describe this quantity as SSNpSq. Our goal\nin this section is to de\ufb01ne the function SSNpSq, which we refer to as the induced lifted width of S.\n4.1 Computing the Induced Lifted Width of a Schematic-In the propositional And/Or framework,\nthe inference cost of a pseudotree T is determined by DR, the tree decomposition of the graph\nG \u201c xN odespTq, BackEdgespTqy induced by the variable ordering attained by traversing T along\nany DFS ordering from root to leaves. [6]. Inference is Opexppwqq, where w is the size of the largest\ncluster in DR. The analogous procedure in lifted And/Or requires additional information be stored at\neach cluster. Lifted tree decompositions are identical to their propositional counterparts with two\nexceptions. First, each cluster Ci requires the ordering of its nodes induced by the original order of\nS. Second, each cluster Ci that contains a node which occurs after a decomposer label requires the\ninclusion of the decomposer label. Formally:\nDe\ufb01nition The tree sequence TS associated with schematic S is a partially ordered set such that:\n(1) O P N odespSq \u00f1 O P TS, (2) pA, Nq with label l P EdgespSq \u00f1 pA, lq P TS, and (3)\nAncpN1, N2, Sq \u00f1 N1 \u0103 N2 P TS.\nDe\ufb01nition The path sequence P associated with tree sequence TS of schematic S is any totally\nordered subsequence of TS.\nDe\ufb01nition Given a schematic S and its tree sequence TS, the Lifted Tree Decomposition of TS,\ndenoted DS, is a pair pC, Tq in which C is a set of path sequences and T is a tree whose nodes are the\nmembers of C satisfying the following properties: (1) @pO, Aq P BackEdgespPq,Di s.t. O, A P Ci,\n(2) @i, j, k s.t Ck P P athTpCi, Cjq, Ci X Cj \u010e Ck, (3) @A P TS, O P Ci, A \u0103 O \u00f1 A P Ci.\nGiven the partial ordering of nodes de\ufb01ned by S, each schematic S induces a unique Lifted Tree\nDecomposition, DS. Computing SSNpSq requires computing maxCiPC SSCpCiq. There exists a\ntotal ordering over the nodes in each Ci; hence the lifted structure in each Ci constitutes a path. We\ntake the lifted search space generated by each cluster C to be a tree; hence computing the maximum\nnode replication is equivalent to computing the number of leaves in SSC.\nIn order to calculate the induced lifted width of a given path, we must \ufb01rst determine which Or\nnodes are counted over dependently. Let VC \u201c tv | xR, \u0398, \u03b1, i, c, ty P C, \u03b8 P \u0398, \u03b8\u03b1 \u201c vu be the set\nof variables that are counted over by an Or node in cluster C. Let VC be a partition of VC into its\ndependent variable counting sets; i.e. de\ufb01ne the binary relation CS \u201c tpv1, v2q | DxR, \u0398, \u03b1, i, c, ty P\nVS s.t D\u03b8, \u03b81 P \u0398, \u03b8\u03b1 \u201c v1, \u03b81\nS is the transitive\nclosure of CS. Let VC \u201c tVj | v1, v2 P Vj \u00f0\u00f1 pv1, v2q P C`\nS u. Variables that appear in a set\nVj P VC refer to the same set of objects; thus all have the same type \u03c4j and they all share the same\npartition of the entities of Tj. Let Pj denote the partition of the entities of Tj w.r.t variable set Vj.\nThen each block pij P Pj is counted over independently (we refer to each pij as a dependent counting\npath ). Thus we can calculate the total leaves corresponding to cluster C by taking the product of the\nleaves of each pij block:\n\n\u03b1 \u201c v2u. Then V \u201c tv1 | pv, v1q P C`\n\nS u, where C`\n\n\u015b\n\n\u015b\n\nVjPVC\n\npijPPj\n\n(1)\nAnalysis of lifted Or nodes that count over the same block pij depends on the structure of the\ndecomposers sets over the structure. First, we consider the case in which C contains no decomposers.\n4.2 Lifted Or Nodes with No Decomposer-Consider ORC,Vj ,i, the sequence of nodes in C that\nperform conditioning over the i-th block of the partition of the variables in Vj. The nodes in ORC,Vj ,i\ncount over the same set of entities. A conditioning assignment at O assigns ct P t0 . . . cu entities to\nT rue and cf \u201c c\u00b4 ct entities to F alse w.r.t. its predicate, breaking the symmetry over the c elements\nin the block. Each O1 P ORp,Vj ,i that occurs after O must perform counting over two sets of size ct\nand cf separately. The number of assignments for block tVj, iu grows exponentially with the number\nof ancestors counting over tVj, iu whose truth value is unknown. Formally, let cij be the size of the\ni-th block of the partition of Vj, and let nij \u201c |tO | O P ORC,Vj ,i, N \u201c xR, \u0398, \u03b1, i, c, unknownyu|.\nFor an initial domain size cij and predicate count nij, we must compute the number of possible ways\nto represent cij as a sum of 2nij non-negative integers. De\ufb01ne kij \u201c 2nij . We can count the number\nof leaf nodes generated by counting the number of weak compositions of cij into kij parts. Thus the\nnumber of search space leaves corresponding to pij generated by C is:\n\n`\n\n\u02d8\n\nSSCpCq \u201c\n\nSSpppijq\n\nSSpppijq \u201c Wpcij, kijq \u201c\n\ncij`kij\u00b41\n\nkij\u00b41\n\n(2)\n\n5\n\n\fs, a \u00b4 s, sqq\n\n\u02d8\n\n`\n\n\u02d8\u00b4`\n\n`\n\n2n\u00b4k\n\nreturn WC pa{p2dqq\n\nas Non Counting Arguments\n\n`\n\nExample Consider the example in Figure 1(a). There is a single path from the root to a leaf. The\nset of variables appearing on the path, V \u201c txu, and hence the partition of V into variables that are\ncounted over together yields ttxuu. Thus n1,1 \u201c |tpR1p2, U nq, S1p2, U nqu| \u201c 2, c1,1 \u201c 2, and\nk1,1 \u201c 4. So we can count the leaves of the model by the expression\n4.3 Lifted Or Nodes with Decomposers- To\ndetermine the size of the search tree induced by\na subsequence P that contains decomposers, we\nmust consider whether the counting argument\nof each Or node is decomposed on.\n4.3.1 Lifted Or Nodes with Decomposers\n\n3!2! \u201c 10.\nAlgorithm 3 Function countPathLeaves\n1: Input: a subsequence path P\n2: Output: fpxq : Z` \u00d1 Z`, where x is a domain size and fpxq\n3: //we represent the recursive polynomial\n\nis the number of search space leaves generated by P\napwc1 - wc2q as a triple pa, wc1, wc2q,\nwhere a P\ncompositions (base case) or triples of this\ntype (recursive case)\n\nZ, and wc1, wc2 are either weak\n\n2`4\u00b41\n4\u00b41\n\n\u201c 5!\n\n\u02d8\n\nreturn WCD (a,applyDec d b,applyDec d c)\n\ncounting variables that are decomposers\n\n4: type WCP = WC INT | WCD (INT,WCP,WCP)\n5: //evalPoly constructs the polynomial\n6: function MAKEPOLY((WC nq, pt, a, sq)\n2t\u00b4a , WC n, WC pn \u00b4 2t\u00b4aqq\n7:\n8: function MAKEPOLY((WCD (c, wc1, wc2qq, pt, a, sqq\n9:\n\nreturn WCD ( n\nreturn WCDpa, makePoly wc1 pt, a, sq, makePoly wc2 pt\u00b4\n\n10: //applyDec divides out the Or nodes with\n11: function APPLYDEC(d,(WC a))\n12:\n13: function APPLYDEC(d,(WCD (a,b,c)))\n14:\n15: //evalPoly creates a function that takes a\ndomain and computes the differences of the\nconstituent weak compositions\nreturn a \u02c6 (evalPoly b x - evalPoly c x)\n\n16: function EVALPOLY((WCD (a,b,c)),x)\n17:\n18: function EVALPOLY((WC a),x)\nx`a\u00b41\n19:\na\u00b41\n20: t = totalOrNodes(P )\n21: dv = orNodesWithDecomposerCountingArgument(P )\n22: poly = WC 2t; orNodesAbove=0;orNodesBetween=0\n23: for N of P do\n24:\n25:\n26:\n27:\n28:\n\nWe \ufb01rst consider the case when ORC,Vj ,i con-\ntains decomposer variables as non-counting\narguments.\nFor each parent-to-child edge\n(A,N,label l), Algorithm 1 generates a child for\neach non-zero assignment in the counting store\ncontaining the decomposer variable. If a path\nsubsequence over variable v of initial domain c\nhas n Or nodes, k of which occur below the de-\ncomposer label, then we can compute the num-\nber of assignments in the counting store at each\ndecomposer as 2n\u00b4k. Further, we can compute\nthe number of non-zero leaves generated by each\nassignment can be computed as the difference\nin leaves from the model over n Or nodes and\nthe model over k Or nodes. Hence the result-\ning model has\nleaves. This procedure can be repeated by recur-\nsively applying the rule to split each weak com-\nposition into a difference of weak compositions\nfor each decomposer label present in the subse-\nquence under consideration (Algorithm 3). The\n\ufb01nal result is a polynomial in c, which, when\ngiven a domain size, returns the number of leaves generated by the path subsequence.\nExample Consider the example in Figure 1(c). Again there is a single path from the root to a leaf. The\nset of variables appearing on the path is V \u201c tx, yu. The partition of V into variables that are counted\nover together yields V \u201c ttx, yuu.Algorithm 3 returns the polynomial fpxq \u201c 2pWpx, 4q\u00b4Wpx, 2qq.\nSo the search space contains 2p\n4.3.2 Lifted Or Nodes with Decomposers as Counting Arguments\nThe procedure is similar for the case when P contains Or nodes that count over variables that have\nbeen decomposed one addition. Or nodes that count over a variable that has previously appeared as\nthe decomposer label of an ancestor in the path have a domain size of 1 and hence always spawn\nWp1, 2q \u201c 2 children instead of Wpx, 2q children. If there are d Or nodes in P that count over\ndecomposed variables, we must divide the k term of each weak composition in our polynomial by 2d.\nLines 11 \u00b4 14 of Algorithm 3 perform this operation.\nExample Consider the example shown in Figure 1(b). Again there is one path from the root\nto leaf, with V \u201c tx, yu; partitioning V into sets of variables that are counted over together\nyields V \u201c ttxu,tyuu. Thus n1,1 \u201c |tpR1p2, U nqu| \u201c 1, c1,1 \u201c 2, and k1,1 \u201c 2. Similarly,\nn2,1 \u201c |tS1p2, U nq|s| \u201c 1, c2,1 \u201c 2, and k2,1 \u201c 2. Algorithm 3 returns the constant functions\nf1pxq \u201c f2pxq \u201c 2 \u02c6 Wpx, 1q \u201c 2. Equation 1 indicates that we take the product of these functions.\nSo the search space contains 4 leaves regardless of the domain sizes of x and y.\n4.4 Overall Complexity-Detailed analysis, as well as a proof of correctness of Algorithm 3 is given\nin the supplemental material section. Here we give general complexity results.\n\npoly = makePoly poly (t,orNodesAbove,orNodesBetween)\norNodesBetween=0\n\nreturn 2dv\u02c6 evalPoly (applyDec dv poly)\n\n2`4\u00b41\n4\u00b41\n\n\u00b4\n\n2`2\u00b41\n2\u00b41\n\nif N \u201c pA, xv, p, cyq then\n\nc`2n\u00b41\n2n\u00b41\n\n\u00b4\n\nc`2k\u00b41\n2k\u00b41\n\nq \u201c 14 leaves.\n\norNodesAbove++;orNodesBetween++\n\n`\n\nreturn\n\n`\n\n\u02d8\n\n`\n\n\u02d8\n\n\u02d8\u00af\n\n\u02d8\n\nelse\n\n6\n\n\freturn sizef\n\nAlgorithm 5 Function evalRaoFunction\n1: Input: a counting store, cs, a list of list of size functions, sf\n2: Output: s P Z`, the cost of exact inference\n3: clusterCosts \u201c tu\n4: for cf i of sf do\nclusterCost \u201c 1\n5:\nfor xVj , fjy of cf i do\n6:\n7:\n8:\n9:\n10:\n\nassigns \u201c getCCpVjq\nfor sk of assigns do\n\nclusterCosts.append(clusterCost)\n\nreturn maxpclusterCostsq\n\nsizef.append(cf)\n\nfj = countPathLeaves(Pj)\ncf.append(xVJ , fjy)\n\nclusterCost \u201c clusterCost \u02c6 fjpskq\n\nAlgorithm 4 Function makeRaoFunction\n1: Input: a schematic S\n2: Output: fpxq : CS \u00d1 Z`\n3: find the clusters of S\n4: pC, T q = \ufb01ndTreeDecomposition(S)\n5: sizef \u201c tu\n6: for Ci of C do\nP = dependentCountingPaths(Ci)\n7:\ncf \u201c tu\n8:\nfor pVj , Pjq of P do\n9:\n10:\n11:\n12:\n\nTheorem 4.1 Given a lifted And/Or Schematic S with associated Tree Decomposition DS \u201c pC, Tq,\nthe overall time and space complexity of inference in S is OpmaxCiPCSSCpCiqq.\n5 An Application: Rao-Blackwellised Importance Sampling\nRao-Blackwellisation [1, 3] is a variance-reduction\ntechnique which combines exact inference with\nsampling. The idea is to partition the ground atoms\ninto two sets: a set of atoms, say X that will be\nsampled and a set of atoms that will be summed\nout analytically using exact inference techniques,\nsay Y. Typically, the accuracy (variance decreases)\nimproves as the cardinality of Y is increased. How-\never, so does the cost of exact inference, which in\nturn decreases the accuracy because fewer samples\nare generated. Thus, there is a trade-off.\nRao-Blackwellisation is particularly useful in lifted\nsampling schemes because subproblems over large\nsets of random variables are often tractable (e.g.\nsubproblems containing 2n assignments can often\nbe summed out in Opnq time via lifted condition-\ning, or in Op1q time via lifted decomposition). The\napproach presented in Section 3 is ideal for this\ntask because Algorithm 3 returns a function that\nis speci\ufb01ed at the schematic level rather than the\nsearch space level. Computing the size of the re-\nmaining search space requires just the evaluation of\na set of polynomials. In this section, we introduce\nour sampling scheme, which adds Rao-Blackwellisation to lifted importance sampling (LIS) (as\ndetailed in [9, 10]). Technically, LIS is a minor modi\ufb01cation of PTP, in which instead of searching\nover all possible truth assignments to ground atoms via lifted conditioning, the algorithm generates a\nrandom truth assignment (lifted sampling), and weighs it appropriately to yield an unbiased estimate\nof the partition function.\n5.1 Computing the size bounding function-Given a schematic S \u201c xVS, ES, vry to sample, we\nintroduce a preprocessing step that constructs a size evaluation function for each v P VS. Algorithm\n4 details the process of creating the function for one node. It takes as input the schematic S rooted at\nv. It \ufb01rst \ufb01nds the tree decomposition of S. The algorithm then \ufb01nds the dependent paths in each\ncluster; \ufb01nally, it applies Algorithm 3 to each dependent path and wraps the resulting function with\nthe variable dependency. It returns a list of list of (variable,function) pairs.\n5.2 Importance Sampling at lifted Or Nodes-Importance sampling at lifted Or nodes is similar to\nits propositional analogue. Each lifted Or node is now speci\ufb01ed by an 8-tuple xR, \u0398, \u03b1, i, c, t, Q, sfy,\nin which Q is the proposal distribution for pR, iq, and sf is the output of Algorithm 4. The sampling\nalgorithm takes an additional input, cb, specifying the complexity bound for Rao-Blackwellisation.\nGiven an or Node where t \u201cunknown, we \ufb01rst compute the cost of exact inference.\nAlgorithm 5 describes the procedure. It takes as input (1) the list of lists sf output by Algorithm 4,\nand (2) the counting store, detailing the counting assignments already made by the current sample.\nFor each sublist in the input list, the algorithm evaluates each (variable,function) pair by (1) retrieving\nthe list of current assignments from the counting store, (2) evaluating the function for the domain size\nof each assignment, and (3) computing the product of the results. Each of these values represents a\nbound on the cost of inference for a single cluster; Algorithm 5 returns c, the maximum of this list.\nIf c \u0103\u201c cb we call evalN odepSq; otherwise we sample assignment i from Q with probability\nqi, update the counting store with assignment i, and call sampleN odepS1q, where S1 is the child\nschematic, yielding estimate \u02c6w of the partition function of S1. We then return \u02c6\u03b4S \u201c \u02c6w\nas the estimate\nof the partition function at S.\n5.3 Importance Sampling at lifted And Nodes-Importance sampling at lifted And nodes differs\nfrom its propositional counterpart in that a decomposer labeled edge pA, Tq represents d distributions\n\nqi\n\n7\n\n\f\u015b\n\nthat are not only independent but also identical. Let A be a lifted And node that we wish to\nsample, with children S1, . . . , Sk, with corresponding decomposer labels d1 . . . dk (for each edge\nwith no decomposer label take di \u201c 1). Then the estimator for the partition function at A is:\n\u02c6\u03b4A \u201c\n\n\u015b\n\niPt1..ku\n\njPt1..diu \u03b4Ti.\n\n6 Experiments\n\nWe ran our Rao-Blackwellised Importance Sam-\npler on three benchmark SRMs and datasets: (1)\nThe friends, smokers and Asthma MLN and dataset\ndescribed in [19], (2) The webKB MLN for col-\nlective classi\ufb01cation and (3) The Protein MLN, in\nwhich the task is to infer protein interactions from\nbiological data. All models are available from\nwww.alchemy.cs.washington.edu.\nSetup. For each model, we set 10% randomly se-\nlected ground atoms as evidence, and designated\nthem to have T rue value. We then estimated the\npartition function via our Rao-Blackwellised sampler\nwith complexity bounds t0, 10, 100, 1000u (bound of\n0 yields the LIS algorithm). We used the uniform\ndistribution as our proposal. We ran each sampler\n50 times and computed the sample variance of the\nestimates.\nResults. Figure 2 shows the sample variance of the\nestimators as a function of time. We see that the\nRao-Blackwellised samplers typically have smaller\nvariance than LIS . However, increasing the complex-\nity bound typically does not improve the variance\nas a function of time (but the variance does improve\nas a function of number of samples). Our results\nindicate that the structure of the model plays a role\nin determining the most ef\ufb01cient complexity bound\nfor sampling. In general, models with large decom-\nposers, especially near the bottom of the schematic,\nwill bene\ufb01t from a larger complexity bound, because\nit is often more ef\ufb01cient to perform exact inference\nover a decomposer node.\n\n7 Conclusions and Future Work\n\n(a) Friends and Smokers, Asthma\n\n2600 objects, 10% evidence\n\n(b) webKB\n\n410 objects, 10% evidence\n\nIn this work, we have presented an inference-aware\nrepresentation of SRMs based on the And/Or frame-\nwork. Using this framework, we have proposed an\naccurate and ef\ufb01cient method for bounding the cost of inference for the family of lifted condition-\ning based algorithms, such as Probabilistic Theorem Proving. Given a shattered SRM, we have\nshown how the method can be used to quickly identify tractable subproblems of the model. We\nhave presented one immediate application of the scheme by developing a Rao-Blackwellised Lifted\nImportance Sampling Algorithm, which uses our bounding scheme as a variance reducer.\n\nFigure 2: Log variance as a function of time.\n\n550 objects, 10% evidence\n\n(c) protein\n\nAcknowledgments\n\nWe gratefully acknowledge the support of the Defense Advanced Research Projects Agency (DARPA)\nProbabilistic Programming for Advanced Machine Learning Program under Air Force Research\nLaboratory (AFRL) prime contract no. FA8750-14-C-0005. Any opinions, \ufb01ndings, and conclusions\nor recommendations expressed in this material are those of the author(s) and do not necessarily re\ufb02ect\nthe view of DARPA, AFRL, or the US government.\n\n8\n\n 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 0 200 400 600 800 1000Log Sample VarianceTime(s)Time(s) vs Log Sample Variance:Smooth-test.pdf0101001000 586 587 588 589 590 591 592 593 594 595 0 200 400 600 800 1000Log Sample VarianceTime(s)Time(s) vs Log Sample Variance:Smooth-test.pdf0101001000 1090 1095 1100 1105 1110 1115 0 200 400 600 800 1000Log Sample VarianceTime(s)Time(s) vs Log Sample Variance:Smooth-test.pdf0101001000\fReferences\n[1] B. Bidyuk and R. Dechter. Cutset Sampling for Bayesian Networks. Journal of Arti\ufb01cial\n\nIntelligence Research, 28:1\u201348, 2007.\n\n[2] R Braz, Eyal Amir, and Dan Roth. Lifted \ufb01rst-order probabilistic inference. In Proceedings of\nthe 19th international joint conference on Arti\ufb01cial intelligence, pages 1319\u20131325. Citeseer,\n2005.\n\n[3] George Casella and Christian P Robert. Rao-blackwellisation of sampling schemes. Biometrika,\n\n83(1):81\u201394, 1996.\n\n[4] M. Chavira and A. Darwiche. On probabilistic inference by weighted model counting. Arti\ufb01cial\n\nIntelligence, 172(6-7):772\u2013799, 2008.\n\n[5] Luc De Raedt and Kristian Kersting. Probabilistic inductive logic programming. Springer,\n\n2008.\n\n[6] Rina Dechter and Robert Mateescu. And/or search spaces for graphical models. Arti\ufb01cial\n\nintelligence, 171(2):73\u2013106, 2007.\n\n[7] Michael R. Genesereth and Eric Kao.\n\nClaypool Publishers, 2013.\n\nIntroduction to Logic, Second Edition. Morgan &\n\n[8] Vibhav Gogate and Pedro Domingos. Exploiting logical structure in lifted probabilistic inference.\n\nIn Statistical Relational Arti\ufb01cial Intelligence, 2010.\n\n[9] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In Proceedings of the\nTwenty-Seventh Conference Annual Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI-\n11), pages 256\u2013265, Corvallis, Oregon, 2011. AUAI Press.\n\n[10] Vibhav Gogate, Abhay Kumar Jha, and Deepak Venugopal. Advances in lifted importance\n\nsampling. In AAAI, 2012.\n\n[11] Abhay Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. Lifted inference seen from\nthe other side: The tractable features. In Advances in Neural Information Processing Systems,\npages 973\u2013981, 2010.\n\n[12] Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L Ong, and Andrey\nKolobov. Blog: Probabilistic models with unknown objects. Statistical relational learning,\npage 373, 2007.\n\n[13] M. Niepert. Lifted probabilistic inference: An MCMC perspective. In UAI 2012 Workshop on\n\nStatistical Relational Arti\ufb01cial Intelligence, 2012.\n\n[14] M. Niepert. Symmetry-aware marginal density estimation. In Twenty-Seventh AAAI Conference\n\non Arti\ufb01cial Intelligence, pages 725\u2013731, 2013.\n\n[15] David Poole. First-order probabilistic inference. In IJCAI, volume 3, pages 985\u2013991. Citeseer,\n\n2003.\n\n[16] David Poole, Fahiem Bacchus, and Jacek Kisynski. Towards completely lifted search-based\n\nprobabilistic inference. arXiv preprint arXiv:1107.4035, 2011.\n\n[17] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine learning, 62(1-\n\n2):107\u2013136, 2006.\n\n[18] T. Sang, P. Beame, and H. Kautz. Solving Bayesian networks by weighted model counting. In\nProceedings of the Twentieth National Conference on Arti\ufb01cial Intelligence, pages 475\u2013482,\n2005.\n\n[19] Dan Suciu, Abhay Jha, Vibhav Gogate, and Alexandra Meliou. Lifted inference seen from the\n\nother side: The tractable features. In NIPS, 2010.\n\n[20] Nima Taghipour, Jesse Davis, and Hendrik Blockeel. First-order decomposition trees. In\n\nAdvances in Neural Information Processing Systems, pages 1052\u20131060, 2013.\n\n[21] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted\nprobabilistic inference by \ufb01rst-order knowledge compilation. In Proceedings of the Twenty-\nSecond international joint conference on Arti\ufb01cial Intelligence-Volume Volume Three, pages\n2178\u20132185. AAAI Press, 2011.\n\n[22] Deepak Venugopal and Vibhav Gogate. On lifting the gibbs sampling algorithm. In Advances\n\nin Neural Information Processing Systems, pages 1655\u20131663, 2012.\n\n9\n\n\f", "award": [], "sourceid": 607, "authors": [{"given_name": "David", "family_name": "Smith", "institution": "University of Texas at Dallas"}, {"given_name": "Vibhav", "family_name": "Gogate", "institution": "UT Dallas"}]}