{"title": "New Liftable Classes for First-Order Probabilistic Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 3117, "page_last": 3125, "abstract": "Statistical relational models provide compact encodings of probabilistic dependencies in relational domains, but result in highly intractable graphical models. The goal of lifted inference is to carry out probabilistic inference without needing to reason about each individual separately, by instead treating exchangeable, undistinguished objects as a whole. In this paper, we study the domain recursion inference rule, which, despite its central role in early theoretical results on domain-lifted inference, has later been believed redundant. We show that this rule is more powerful than expected, and in fact significantly extends the range of models for which lifted inference runs in time polynomial in the number of individuals in the domain. This includes an open problem called S4, the symmetric transitivity model, and a first-order logic encoding of the birthday paradox. We further identify new classes S2FO2 and S2RU of domain-liftable theories, which respectively subsume FO2 and recursively unary theories, the largest classes of domain-liftable theories known so far, and show that using domain recursion can achieve exponential speedup even in theories that cannot fully be lifted with the existing set of inference rules.", "full_text": "New Liftable Classes for\n\nFirst-Order Probabilistic Inference\n\nSeyed Mehran Kazemi\n\nThe University of British Columbia\n\nsmkazemi@cs.ubc.ca\n\nAngelika Kimmig\n\nKU Leuven\n\nangelika.kimmig@cs.kuleuven.be\n\nUniversity of California, Los Angeles\n\nThe University of British Columbia\n\nGuy Van den Broeck\n\nguyvdb@cs.ucla.edu\n\nDavid Poole\n\npoole@cs.ubc.ca\n\nAbstract\n\nStatistical relational models provide compact encodings of probabilistic depen-\ndencies in relational domains, but result in highly intractable graphical models.\nThe goal of lifted inference is to carry out probabilistic inference without need-\ning to reason about each individual separately, by instead treating exchangeable,\nundistinguished objects as a whole. In this paper, we study the domain recur-\nsion inference rule, which, despite its central role in early theoretical results on\ndomain-lifted inference, has later been believed redundant. We show that this\nrule is more powerful than expected, and in fact signi\ufb01cantly extends the range\nof models for which lifted inference runs in time polynomial in the number of\nindividuals in the domain. This includes an open problem called S4, the symmetric\ntransitivity model, and a \ufb01rst-order logic encoding of the birthday paradox. We\nfurther identify new classes S 2FO 2 and S 2RU of domain-liftable theories, which\nrespectively subsume FO 2 and recursively unary theories, the largest classes of\ndomain-liftable theories known so far, and show that using domain recursion can\nachieve exponential speedup even in theories that cannot fully be lifted with the\nexisting set of inference rules.\n\n1\n\nIntroduction\n\nStatistical relational learning (SRL) [8] aims at unifying logic and probability for reasoning and\nlearning in noisy domains, described in terms of individuals (or objects), and the relationships\nbetween them. Statistical relational models [10], or template-based models [18] extend Bayesian and\nMarkov networks with individuals and relations, and compactly describe probabilistic dependencies\namong them. These models encode exchangeability among the objects: individuals that we have the\nsame information about are treated similarly.\nA key challenge with SRL models is the fact that they represent highly intractable, densely connected\ngraphical models, typically with millions of random variables. The aim of lifted inference [23] is to\ncarry out probabilistic inference without needing to reason about each individual separately, by instead\ntreating exchangeable, undistinguished objects as a whole. Over the past decade, a large number of\nlifted inference rules have been proposed [5, 9, 11, 14, 20, 22, 28, 30], often providing exponential\nspeedups for speci\ufb01c SRL models. These basic exact inference techniques have applications in\n(tractable) lifted learning [32], where the main task is to ef\ufb01ciently compute partition functions, and\nin variational and over-symmetric approximations [29, 33]. Moreover, they provided the foundation\nfor a rich literature on approximate lifted inference and learning [1, 4, 13, 17, 19, 21, 25, 34].\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fThe theoretical study of lifted inference began with the complexity notion of domain-lifted infer-\nence [31] (a concept similar to data complexity in databases). Inference is domain-lifted when it runs\nin time polynomial in the number of individuals in the domain. By identifying liftable classes of\nmodels, guaranteeing domain-lifted inference, one can characterize the theoretical power of the vari-\nous inference rules. For example, the class FO 2, encoding dependencies among pairs of individuals\n(i.e., two logical variables), is liftable [30]. Kazemi and Poole [15] introduce a liftable class called\nrecursively unary, capturing hierarchical simpli\ufb01cation rules. Beame et al. [3] identify liftable classes\nof probabilistic database queries. Such results elevate the speci\ufb01c inference rules and examples to a\ngeneral principle, and bring lifted inference in line with complexity and database theory [3].\nThis paper studies the domain recursion inference rule, which applies the principle of induction on\nthe domain size. The rule makes one individual A in the domain explicit. Afterwards, the other\ninference rules simplify the SRL model up to the point where it becomes identical to the original\nmodel, except the domain size has decreased. Domain recursion was introduced by Van den Broeck\n[31] and was central to the proof that FO 2 is liftable. However, later work showed that simpler rules\nsuf\ufb01ce to capture FO 2 [27], and the domain recursion rule was forgotten.\nWe show that domain recursion is more powerful than expected, and can lift models that are otherwise\nnot amenable to domain-lifted inference. This includes an open problem by Beame et al. [3], asking\nfor an inference rule for a logical sentence called S4. It also includes the symmetric transitivity\nmodel, and an encoding of the birthday paradox in \ufb01rst-order logic. There previously did not exist any\nef\ufb01cient algorithm to compute the partition function of these SRL models, and we obtain exponential\nspeedups. Next, we prove that domain recursion supports its own large classes of liftable models\nS 2FO 2 subsuming FO 2, and S 2RU subsuming recursive unary1. All existing exact lifted inference\nalgorithms (e.g., [11, 15, 28]) resort to grounding the theories in S 2FO 2 or S 2RU that are not in\nFO 2 or recursively unary, and require time exponential in the domain size.\nThese results will be established using the weighted \ufb01rst-order model counting (WFOMC) formulation\nof SRL models [28]. WFOMC is close to classical \ufb01rst-order logic, and it can encode many other\nSRL models, including Markov logic [24], parfactor graphs [23], some probabilistic programs [7],\nrelational Bayesian networks [12], and probabilistic databases [26]. It is a basic speci\ufb01cation language\nthat simpli\ufb01es the development of lifted inference algorithms [3, 11, 28].\n\n2 Background and Notation\n\nA population is a set of constants denoting individuals (or objects). A logical variable (LV) is typed\nwith a population. We represent LVs with lower-case letters, constants with upper-case letters, the\npopulation associated with a LV x with \u2206x, and its cardinality with |\u2206x|. That is, a population \u2206x is\na set of constants {X1, . . . , Xn}, and we use x \u2208 \u2206x as a shorthand for instantiating x with one of\nthe Xi. A parametrized random variable (PRV) is of the form F(t1, . . . , tk) where F is a predicate\nsymbol and each ti is a LV or a constant. A unary PRV contains exactly one LV and a binary PRV\ncontains exactly two LVs. A grounding of a PRV is obtained by replacing each of its LVs x by one\nof the individuals in \u2206x.\nA literal is a PRV or its negation. A formula \u03d5 is a literal, a disjunction \u03d51 \u2228 \u03d52 of formulas, a\nconjunction \u03d51 \u2227 \u03d52 of formulas, or a quanti\ufb01ed formula \u2200x \u2208 \u2206x : \u03d5(x) or \u2203x \u2208 \u2206x : \u03d5(x)\nwhere x appears in \u03d5(x). A sentence is a formula with all LVs quanti\ufb01ed. A clause is a disjunction\nof literals. A theory is a set of sentences. A theory is clausal if all its sentences are clauses. An\ninterpretation is an assignment of values to all ground PRVs in a theory. An interpretation I is a\nmodel of a theory T , I |= T , if given its value assignments, all sentences in T evaluate to True.\nLet F(T ) be the set of predicate symbols in theory T , and \u03a6 : F(T ) \u2192 R and \u03a6 : F(T ) \u2192 R\nbe two functions that map each predicate F to weights. These functions associate a weight with\nassigning True or False to ground PRVs F(C1, . . . , Ck). For an interpretation I of T , let \u03c8T rue\nbe the set of ground PRVs assigned True, and \u03c8F alse the ones assigned False. The weight of I is\nF(C1,...,Ck)\u2208\u03c8F alse \u03a6(F). Given a theory T and two\nfunctions \u03a6 and \u03a6, the weighted \ufb01rst-order model count (WFOMC) of the theory given \u03a6 and \u03a6\n\nF(C1,...,Ck)\u2208\u03c8T rue \u03a6(F) \u00b7(cid:81)\n\ngiven by \u03c9(I) =(cid:81)\nis: WFOMC(T|\u03a6, \u03a6) =(cid:80)\n\nI|=T \u03c9(I).\n\n1All proofs can be found in the extended version of the paper at: https://arxiv.org/abs/1610.08445\n\n2\n\n\fIn this paper, we assume that all theories are clausal and do not contain existential quanti\ufb01ers. The\nlatter can be achieved using the Skolemization procedure of Van den Broeck et al. [30], which\nef\ufb01ciently transforms a theory T with existential quanti\ufb01ers into a theory T (cid:48) without existential\nquanti\ufb01ers that has the same weighted model count. That is, our theories are sets of \ufb01nite-domain,\nfunction-free \ufb01rst-order clauses whose LVs are all universally quanti\ufb01ed (and typed with a population).\nFurthermore, when a clause mentions two LVs x1 and x2 with the same population \u2206x, or a LV x\nwith population \u2206x and a constant C \u2208 \u2206x, we assume they refer to different individuals.2\nExample 1. Consider the theory \u2200x \u2208 \u2206x : \u00acSmokes(x) \u2228 Cancer(x) having only one clause and\nassume \u2206x = {A, B}. The assignment Smokes(A) = True, Smokes(B) = False, Cancer(A) =\nTrue, Cancer(B) = True is a model. Assuming \u03a6(Smokes) = 0.2, \u03a6(Cancer) = 0.8, \u03a6(Smokes) =\n0.5 and \u03a6(Cancer) = 1.2, the weight of this model is 0.2 \u00b7 0.5 \u00b7 0.8 \u00b7 0.8. This theory has eight other\nmodels. The WFOMC can be calculated by summing the weights of all nine models.\n\n2.1 Converting Inference for SRL Models into WFOMC\n\nFor many SRL models, (lifted) inference can be converted into a WFOMC problem. As an example,\nconsider a Markov logic network (MLN) [24] with weighted formulae (w1 : F1, . . . , wk : Fk). For\nevery weighted formula wi : Fi of this MLN, let theory T have a sentence Auxi(x, . . . ) \u21d4 Fi such\nthat Auxi is a predicate having all LVs appearing in Fi. Assuming \u03a6(Auxi) = exp(wi), and \u03a6 and\n\u03a6 are 1 for the other predicates, the partition function of the MLN is equal to WFOMC(T ).\n\n2.2 Calculating the WFOMC of a Theory\nWe now describe a set of rules R that can be applied to a theory to \ufb01nd its WFOMC ef\ufb01ciently;\nfor more details, readers are directed to [28], [22] or [11]. We use the following theory T with two\nclauses and four PRVs (S(x, m), R(x, m), T(x) and Q(x)) as our running example:\n\n\u2200x \u2208 \u2206x, m \u2208 \u2206m : Q(x) \u2228 R(x, m) \u2228 S(x, m)\n\n\u2200x \u2208 \u2206x, m \u2208 \u2206m : S(x, m) \u2228 T(x)\n\nLifted Decomposition Assume we ground x in T . Then the clauses mentioning an arbitrary\nXi \u2208 \u2206x are \u2200m \u2208 \u2206m : Q(Xi) \u2228 R(Xi, m) \u2228 S(Xi, m) and \u2200m \u2208 \u2206m : S(Xi, m) \u2228 T(Xi).\nThese clauses are totally disconnected from clauses mentioning Xj \u2208 \u2206x (j (cid:54)= i), and are the\nsame up to renaming Xi to Xj. Given the exchangeability of the individuals, we can calculate\nthe WFOMC of only the clauses mentioning Xi and raise the result to the power of the number of\nconnected components (|\u2206x|). Assuming T1 is the theory that results from substituting x with Xi,\nWFOMC(T ) = WFOMC(T1)|\u2206x|.\n\nCase-Analysis The WFOMC of T1 can be computed by a case analysis over different assignments\nof values to a ground PRV, e.g., Q(Xi). Let T2 and T3 represent T1 \u2227 Q(Xi) and T1 \u2227 \u00acQ(Xi)\nrespectively. Then, WFOMC(T1) = WFOMC(T2) + WFOMC(T3). We follow the process for\nT3 (the process for T2 will be similar) having clauses \u00acQ(Xi), \u2200m \u2208 \u2206m : Q(Xi) \u2228 R(Xi, m) \u2228\nS(Xi, m) and \u2200m \u2208 \u2206m : S(Xi, m) \u2228 T(Xi).\n\nUnit Propagation When a clause in the theory has only one literal, we can propagate the effect\nof this clause through the theory and remove it3. In T3, \u00acQ(Xi) is a unit clause. Having this\nunit clause, we can simplify the second clause and get the theory T4 having clauses \u2200m \u2208 \u2206m :\nR(Xi, m) \u2228 S(Xi, m) and \u2200m \u2208 \u2206m : S(Xi, m) \u2228 T(Xi).\n\nLifted Case-Analysis Case-analysis can be done for PRVs having one logical variable in a lifted\nway. Consider the S(Xi, m) in T4. Due to the exchangeability of the individuals, we do not have\nto consider all possible assignments to all ground PRVs of S(Xi, m), but only the ones where the\nnumber of individuals M \u2208 \u2206m for which S(Xi, M ) is True (or equivalently False) is different.\nThis means considering |\u2206m| + 1 cases suf\ufb01ces, corresponding to S(Xi, M ) being True for exactly\n\nj = 0, . . . ,|\u2206m| individuals. Note that we must multiply by(cid:0)|\u2206m|\n\n(cid:1) to account for the number\n\nj\n\n2Equivalently, we can disjoin x1 = x2 or x = C to the clause.\n3Note that unit propagation may remove clauses and random variables from the theory. To account for them,\n\nsmoothing multiplies the WFOMC by 2#rv, where #rv represents the number of removed variables.\n\n3\n\n\fof ways one can select j out of |\u2206m| individuals. Let T4j represent T4 with two more clauses:\n\u2200m \u2208 \u2206mT : S(Xi, m) and \u2200m \u2208 \u2206mF : \u00acS(Xi, m), where \u2206mT represents the j individuals\nin \u2206m for which S(Xi, M ) is True, and \u2206mF represents the other |\u2206m| \u2212 j individuals. Then\n\n(cid:0)|\u2206m|\n\n(cid:1)WFOMC(T4j).\n\nWFOMC(T4) =(cid:80)|\u2206m|\n\nj=0\n\nj\n\nShattering In T4j, the individuals in \u2206m are no longer exchangeable: we know different things\nabout those in \u2206mT and those in \u2206mF . We need to shatter every clause having individuals coming\nfrom \u2206m to make the theory exchangeable. To do so, the clause \u2200m \u2208 \u2206m : R(Xi, m)\u2228 S(Xi, m) in\nT4j must be shattered to \u2200m \u2208 \u2206mT : R(Xi, m)\u2228S(Xi, m) and \u2200m \u2208 \u2206mF : R(Xi, m)\u2228S(Xi, m)\n(and similarly for the other formulae). The shattered theory T5j after unit propagation will have\nclauses \u2200m \u2208 \u2206mF : R(Xi, m) and \u2200m \u2208 \u2206mF : T(Xi).\nDecomposition, Caching, and Grounding In T5j, the two clauses have different PRVs, i.e., they\nare disconnected. In such cases, we apply decomposition, i.e., \ufb01nd the WFOMC of each connected\ncomponent separately and return the product. The WFOMC of the theory can be found by continuing\nto apply the above rules. In all the above steps, after \ufb01nding the WFOMC of each (sub-)theory, we\nstore the results in a cache so we can reuse them if the same WFOMC is required again. By following\nthese steps, one can \ufb01nd the WFOMC of many theories in polynomial time. However, if we reach a\npoint where none of the above rules are applicable, we ground one of the populations which makes\nthe process exponential in the number of individuals.\n\n2.3 Domain-Liftability\n\nThe following notions allow us to study the power of a set of lifted inference rules.\nDe\ufb01nition 1. A theory is domain-liftable [31] if calculating its WFOMC is polynomial in\n|\u2206x1|,|\u2206x2|, . . . ,|\u2206xk| where x1, x2, . . . , xk represent the LVs in the theory. A class C of the-\nories is domain-liftable if \u2200T \u2208 C, T is domain-liftable.\n\nSo far, two main classes of domain-liftable theories have been recognized: FO 2 [30, 31] and\nrecursively unary [15, 22].\nDe\ufb01nition 2. A theory is in FO 2 if all its clauses have up to two LVs.\nDe\ufb01nition 3. A theory T is recursively unary (RU) if for every theory T (cid:48) resulting from applying\nrules in R except for lifted case analysis to T , until no more rules apply, there exists some unary PRV\nin T (cid:48) and a generic case of lifted case-analysis on this unary PRV is itself RU.\n\nNote that the time needed to check whether a theory is in FO 2 or RU is independent of the domain\nsizes in the theory. For FO 2, the membership check can be done in time linear in the size of the\ntheory, whereas for RU, only a worst-case exponential procedure is known. Thus, FO 2 currently\noffers a faster membership check than RU, but as we show later, RU subsumes FO 2. This gives rise to\na trade-off between fast membership checking and modeling power for, e.g., lifted learning purposes.\n\n3 The Domain Recursion Rule\n\nVan den Broeck [31] considered another rule called domain recursion in the set of rules for calculating\nthe WFOMC of a theory. The intuition behind domain recursion is that it modi\ufb01es a domain \u2206x by\nmaking one element explicit: \u2206x = \u2206x(cid:48) \u222a {A} with A (cid:54)\u2208 \u2206x(cid:48). Next, clauses are rewritten in terms\nof \u2206x(cid:48) and A while removing \u2206x from the theory entirely. Then, by applying standard rules in R\non this modi\ufb01ed theory, the problem is reduced to a WFOMC problem on a theory identical to the\noriginal one, except that \u2206x is replaced by the smaller domain \u2206x(cid:48). This lets us compute WFOMC\nusing dynamic programming. We refer to R extended with the domain recursion rule as RD.\nExample 2. Suppose we have a theory whose only clause is \u2200x, y \u2208 \u2206p : \u00acFriend(x, y) \u2228\nFriend(y, x), stating if x is friends with y, y is also friends with x. One way to calculate the\nWFOMC of this theory is by grounding only one individual in \u2206p and then using R. Let A be an\nindividual in \u2206p and let \u2206p(cid:48) = \u2206p \u2212 {A}. We can (using domain recursion) rewrite the theory\nas: \u2200x \u2208 \u2206p(cid:48)\n: \u00acFriend(A, y) \u2228 Friend(y, A), and\n\u2200x, y \u2208 \u2206p(cid:48) : \u00acFriend(x, y)\u2228 Friend(y, x). Lifted case-analysis on Friend(p(cid:48), A) and Friend(A, p(cid:48)),\n\n: \u00acFriend(x, A) \u2228 Friend(A, x), \u2200y \u2208 \u2206p(cid:48)\n\n4\n\n\fshattering and unit propagation give \u2200x, y \u2208 \u2206p(cid:48) : \u00acFriend(x, y) \u2228 Friend(y, x). This theory is\nequivalent to our initial theory, with the only difference being that the population of people has\ndecreased by one. By keeping a cache of the values of each sub-theory, one can verify that this\nprocess \ufb01nds the WFOMC of the above theory in polynomial time.\n\nNote that the theory in Example 2 is in FO 2 and as proved in [27], its WFOMC can be computed\nwithout using the domain recursion rule4. This proof has caused the domain recursion rule to be\nforgotten in the lifted inference community. In the next section, we revive this rule and identify a\nclass of theories that are only domain-liftable when using the domain recursion rule.\n\n4 Domain Recursion Makes More Theories Domain-Liftable\nIn this section, we show three example theories that are not domain-liftable when using R, yet\nbecome domain-liftable with domain recursion.\nS4 Clause: Beame et al. [3] identi\ufb01ed a clause (S4) with four binary PRVs having the same predicate\nand proved that, even though the rules R in Section 2.2 cannot calculate the WFOMC of that clause,\nthere is a polynomial-time algorithm for \ufb01nding its WFOMC. They concluded that this set of rules R\nfor \ufb01nding the WFOMC of theories does not suf\ufb01ce, asking for new rules to compute their theory.\nWe prove that adding domain recursion to the set achieves this goal.\nProposition 1. The theory consisting of the S4 clause \u2200x1, x2 \u2208 \u2206x, y1, y2 \u2208 \u2206y : S(x1, y1) \u2228\n\u00acS(x2, y1) \u2228 S(x2, y2) \u2228 \u00acS(x1, y2) is domain-liftable using RD.\nSymmetric Transitivity: Domain-liftable calculation of WFOMC for the transitivity formula is\na long-standing open problem. Symmetric transitivity is easier as its model count corresponds to\nthe Bell number, but solving it using general-purpose rules has been an open problem. Consider\nclauses \u2200x, y, z \u2208 \u2206p : \u00acF(x, y) \u2228 \u00acF(y, z) \u2228 F(x, z) and \u2200x, y \u2208 \u2206p : \u00acF(x, y) \u2228 F(y, x) de\ufb01ning\na symmetric transitivity relation. For example, \u2206p may indicate the population of people and F may\nindicate friendship.\nProposition 2. The symmetric-transitivity theory is domain-liftable using RD.\nBirthday Paradox: The birthday paradox problem [2] is to compute the probability that in a set\nof n randomly chosen people, two of them have the same birthday. A \ufb01rst-order encoding of this\nproblem requires computing the WFOMC for a theory with clauses \u2200p \u2208 \u2206p,\u2203d \u2208 \u2206d : Born(p, d),\n\u2200p \u2208 \u2206p, d1, d2 \u2208 \u2206d : \u00acBorn(p, d1) \u2228 \u00acBorn(p, d2), and \u2200p1, p2 \u2208 \u2206p, d \u2208 \u2206d : \u00acBorn(p1, d) \u2228\n\u00acBorn(p2, d), where \u2206p and \u2206d represent the population of people and days. The \ufb01rst two clauses\nimpose the condition that every person is born in exactly one day, and the third clause states the \u201cno\ntwo people are born on the same day\u201d query.\nProposition 3. The birthday-paradox theory is domain-liftable using RD.\n\n5 New Domain-Liftable Classes: S 2FO 2 and S 2RU\n\nIn this section, we identify new domain-liftable classes, enabled by the domain recursion rule.\nDe\ufb01nition 4. Let \u03b1(S) be a clausal theory that uses a single binary predicate S, such that each clause\nhas exactly two different literals of S. Let \u03b1 = \u03b1(S1)\u2227 \u03b1(S2)\u2227\u00b7\u00b7\u00b7\u2227 \u03b1(Sn) where the Si are different\nbinary predicates. Let \u03b2 be a theory where all clauses contain at most one Si literal, and the clauses\nthat contain an Si literal contain no other literals with more than one LV. Then, S 2FO 2 and S 2RU\nare the classes of theories of the form \u03b1 \u2227 \u03b2 where \u03b2 \u2208 FO 2 and \u03b2 \u2208 RU respectively.\nTheorem 1. S 2FO 2 and S 2RU are domain-liftable using RD.\nProof. The case where \u03b1 = \u2205 is trivial. Let \u03b1 = \u03b1(S1) \u2227 \u03b1(S2) \u2227 \u00b7\u00b7\u00b7 \u2227 \u03b1(Sn). Once we remove\nall PRVs having none or one LV by (lifted) case-analysis, the remaining clauses can be divided into\nn + 1 components: the i-th component in the \ufb01rst n components only contains Si literals, and the\n\n4This can be done by realizing that the theory is disconnected in the grounding for every pair (A, B) of\n\nindividuals and applying the lifted case-analysis.\n\n5\n\n\f(n + 1)-th component contains no Si literals. These components are disconnected from each other,\nso we can consider each of them separately. The (n + 1)-th component comes from clauses in \u03b2\nand is domain-liftable by de\ufb01nition. The following two Lemmas prove that the clauses in the other\ncomponents are also domain-liftable. The proofs of both lemmas rely on domain recursion.\n\nLemma 1. A clausal theory \u03b1(S) with only one predicate S where all clauses have exactly two\ndifferent literals of S is domain-liftable.\nLemma 2. Suppose {\u2206p1, \u2206p2 , . . . , \u2206pn} are mutually exclusive subsets of \u2206x and\n{\u2206q1, \u2206q2, . . . , \u2206qm} are mutually exclusive subsets of \u2206y. We can add any unit clause of the\nform \u2200pi \u2208 \u2206pi, qj \u2208 \u2206qj : S(pi, qj) or \u2200pi \u2208 \u2206pi, qj \u2208 \u2206qj : \u00acS(pi, qj) to the theory \u03b1(S) in\nLemma 1 and the theory is still domain-liftable.\n\nTherefore, theories in S 2FO 2 and S 2RU are domain-liftable.\n\nIt can be easily veri\ufb01ed that membership checking for S 2FO 2 and S 2RU is not harder than for FO 2\nand RU, respectively.\nExample 3. Suppose we have a set \u2206j of jobs and a set \u2206v of volunteers. Every volunteer must\nbe assigned to at most one job, and every job requires no more than one person. If the job involves\nworking with gas, the assigned volunteer must be a non-smoker. And we know that smokers are most\nprobably friends with each other. Then we will have the following \ufb01rst-order theory:\n\n\u2200v1, v2 \u2208 \u2206v, j \u2208 \u2206j : \u00acAssigned(v1, j) \u2228 \u00acAssigned(v2, j)\n\u2200v \u2208 \u2206v, j1, j2 \u2208 \u2206j : \u00acAssigned(v, j1) \u2228 \u00acAssigned(v, j2)\n\n\u2200v \u2208 \u2206v, j \u2208 \u2206j : InvolvesGas(j) \u2227 Assigned(v, j) \u21d2 \u00acSmokes(v)\n\n\u2200v1, v2 \u2208 \u2206v : Aux(v1, v2) \u21d4 (Smokes(v1) \u2227 Friends(v1, v2) \u21d2 Smokes(v2))\n\nPredicate Aux is added to capture the probability assigned to the last rule (as in MLNs). This theory\nis not in FO 2, not in RU, and is not domain-liftable using R. However, the \ufb01rst two clauses are\nof the form described in Lemma 1, the third and fourth are in FO 2 (and also in RU), and the third\nclause, which contains Assigned(v, j), has no other PRVs with more than one LV. Therefore, this\ntheory is in S 2FO 2 (and also in S 2RU ) and domain-liftable based on Theorem 1.\nExample 4. Consider the birthday paradox introduced in Section 4. After Skolemization [30] for\nremoving the existential quanti\ufb01er, the theory contains \u2200p \u2208 \u2206p,\u2200d \u2208 \u2206d : S(p) \u2228 \u00acBorn(p, d),\n\u2200p \u2208 \u2206p, d1, d2 \u2208 \u2206d : \u00acBorn(p, d1) \u2228 \u00acBorn(p, d2), and \u2200p1, p2 \u2208 \u2206p, d \u2208 \u2206d : \u00acBorn(p1, d) \u2228\n\u00acBorn(p2, d), where S is the Skolem predicate. This theory is not in FO 2, not in RU, and is not\ndomain-liftable using R. However, the last two clauses belong to clauses in Lemma 1, the \ufb01rst one is\nin FO 2 (and also in RU) and has no PRVs with more than one LV other than Born. Therefore, this\ntheory is in S 2FO 2 (and also in S 2RU ) and domain-liftable based on Theorem 1.\nProposition 4. FO 2 \u2282 RU, FO 2 \u2282 S 2FO 2, FO 2 \u2282 S 2RU , RU \u2282 S 2RU , S 2FO 2 \u2282 S 2RU .\n\nProof. Let T \u2208 FO 2 and T (cid:48) be any of the theories resulting from exhaustively applying rules in\nR except lifted case analysis on T . If T initially contains a unary PRV with predicate S, either it\nis still unary in T (cid:48) or lifted decomposition has replaced the LV with a constant. In the \ufb01rst case,\nwe can follow a generic branch of lifted case-analysis on S, and in the second case, either T (cid:48) is\nempty or all binary PRVs in T have become unary in T (cid:48) due to applying the lifted decomposition\nand we can follow a generic branch of lifted case-analysis for any of these PRVs. The generic\nbranch in both cases is in FO 2 and the same procedure can be followed until all theories become\nempty. If T initially contains only binary PRVs, lifted decomposition applies as the grounding of\nT is disconnected for each pair of individuals, and after lifted decomposition all PRVs have no\nLVs. Applying case analysis on all PRVs gives empty theories. Therefore, T \u2208 RU. The theory\n\u2200x, y, z \u2208 \u2206p : F(x, y) \u2228 F(y, z) \u2228 F(x, y, z) is an example of a RU theory that is not in FO 2,\nshowing RU (cid:54)\u2282 FO 2. FO 2 and RU are special cases of S 2FO 2 and S 2RU respectively, where\n\u03b1 = \u2205, showing FO 2 \u2282 S 2FO 2 and RU \u2282 S 2RU . However, Example 3 is both in S 2FO 2\nand S 2RU but is not in FO 2 and not in RU, showing S 2FO 2 (cid:54)\u2282 FO 2 and S 2RU (cid:54)\u2282 RU. Since\nFO 2 \u2282 RU and the class of added \u03b1(S) clauses are the same, S 2FO 2 \u2282 S 2RU .\n\n6\n\n\fFigure 1: Run-times for calculating the WFOMC of (a) the theory in Example 3, (b) the S4 clause, and\n(c) symmetric transitivity, using the WFOMC-v3.0 software (which only uses R) and comparing it to\nthe case where we use the domain recursion rule, referred to as Domain Recursion in the diagrams.\n\n6 Experiments and Results\n\nIn order to see the effect of using domain recursion in practice, we \ufb01nd the WFOMC of three theories\nwith and without using the domain recursion rule: (a) the theory in Example 3, (b) the S4 clause, and\n(c) the symmetric-transitivity theory. We implemented the domain recursion rule in C++ and compiled\nthe codes using the g++ compiler. We compare our results with the WFOMC-v3.0 software5. Since\nthis software requires domain-liftable input theories, for the \ufb01rst theory we grounded the jobs, for\nthe second we grounded \u2206x, and for the third we grounded \u2206p. For each of these three theories,\nassuming |\u2206x| = n for all LVs x in the theory, we varied n and plotted the run-time as a function\nof n. All experiments were done on a 2.8GH core with 4GB RAM under MacOSX. The run-times\nare reported in seconds. We allowed a maximum of 1000 seconds for each run.\nObtained results can be viewed in Fig. 1. These results are consistent with our theory and indicate\nthe clear advantage of using the domain recursion rule in practice. In Fig. 1(a), the slope of the\ndiagram for domain recursion is approximately 4 which indicates the degree of the polynomial for\nthe time complexity. Similar analysis can be done for the results on the S4 clause and the symmetric-\ntransitivity clauses represented in Fig. 1(b), (c). The slope of the diagram in these two diagrams is\naround 5 and 2 respectively, indicating that the time complexity for \ufb01nding their WFOMC are n5 and\nn2 respectively, where n is the size of the population.\n\n7 Discussion\n\nWe can categorize theories with respect to the domain recursion rule as: (1) theories proved to be\ndomain-liftable using domain recursion (e.g., S4, symmetric transitivity, and theories in S 2FO 2),\n(2) theories that are domain-liftable using domain recursion, but we have not identi\ufb01ed them yet\nas such, and (3) theories that are not domain-liftable even when using domain recursion. We leave\ndiscovering and characterizing the theories in category 2 and 3 as future work. But here we show that\neven though the theories in category 3 are not domain-liftable using domain recursion, this rule may\nstill result in exponential speedups for these theories.\nConsider the (non-symmetric) transitivity rule: \u2200x, y, z \u2208 \u2206p : \u00acFriend(x, y) \u2228 \u00acFriend(y, z) \u2228\nFriend(x, z). Since none of the rules in R apply to the above theory, the existing lifted inference\nengines ground \u2206p and calculate the weighted model count (WMC) of the ground theory. By\ngrounding \u2206p, these engines lose great amounts of symmetry. Suppose \u2206p = {A, B, C} and assume\nwe select Friend(A, B) and Friend(A, C) as the \ufb01rst two random variables for case-analysis. Due to\nthe exchangeability of the individuals, the case where Friend(A, B) and Friend(A, C) are assigned to\nTrue and False respectively has the same WMC as the case where they are assigned to False and True.\nHowever, the current engines fail to exploit this symmetry as they consider grounded individuals\nnon-exchangeable.\nBy applying domain recursion to the above theory instead of fully grounding it, one can exploit the\nsymmetries of the theory. Suppose \u2206p(cid:48) = \u2206p \u2212 {P}. Then we can rewrite the theory as follows:\n\n\u2200y, z \u2208 \u2206p(cid:48) : \u00acFriend(P, y) \u2228 \u00acFriend(y, z) \u2228 Friend(P, z)\n\n5Available at: https://dtai.cs.kuleuven.be/software/wfomc\n\n7\n\n 0.0010.010.11101001000110100Time in seconds Population size WFOMC-v3.0Domain Recursion 0.0010.010.11101001000110100Time in seconds Population size WFOMC-v3.0Domain Recursion 0.0010.010.111010010003303003000Time in seconds Population size WFOMC-v3.0Domain Recursion(a) (b) (c) \f\u2200x, z \u2208 \u2206p(cid:48) : \u00acFriend(x, P ) \u2228 \u00acFriend(P, z) \u2228 Friend(x, z)\n\u2200x, y \u2208 \u2206p(cid:48) : \u00acFriend(x, y) \u2228 \u00acFriend(y, P ) \u2228 Friend(x, P )\n\u2200x, y, z \u2208 \u2206p(cid:48) : \u00acFriend(x, y) \u2228 \u00acFriend(y, z) \u2228 Friend(x, z)\n\nNow if we apply lifted case analysis on Friend(P, y) (or equivalently on Friend(P, z)), we do not\nget back the same theory with reduced population and calculating the WFOMC is still exponential.\nHowever, we only generate one branch for the case where Friend(P, y) is True only once. This\nbranch covers both the symmetric cases mentioned above. Exploiting these symmetries reduces the\ntime-complexity exponentially.\nThis suggests that for any given theory, when the rules in R are not applicable one may want to try\nthe domain recursion rule before giving up and resorting to grounding a population.\n\n8 Conclusion\n\nWe identi\ufb01ed new classes of domain-liftable theories called S 2FO 2 and S 2RU by reviving the\ndomain recursion rule. We also demonstrated how this rule is useful for theories outside these\nclasses. Our work opens up a future research direction for identifying and characterizing larger\nclasses of theories that are domain-liftable using domain recursion. It also helps us get closer to\n\ufb01nding a dichotomy between the theories that are domain-liftable and those that are not, similar to\nthe dichotomy result of Dalvi and Suciu [6] for query answering in probabilistic databases.\nIt has been shown [15, 16] that compiling the WFOMC rules into low-level programs (e.g., C++\nprograms) offers a (approx.) 175x speedup compared to other approaches. While compiling the\npreviously known rules to low-level programs was straightforward, compiling the domain recursion\nrule to low-level programs without using recursion might be tricky as it relies on the population size\nof the logical variables. A future research direction would be \ufb01nding if the domain recursion rule can\nbe ef\ufb01ciently compiled into low-level programs, and measuring the amount of speedup it offers.\nAcknowledgements. AK is supported by the Research Foundation Flanders (FWO). GVdB is partially supported\nby NSF (#IIS-1633857).\n\nReferences\n[1] Babak Ahmadi, Kristian Kersting, and Sriraam Natarajan. Lifted online training of relational models with\n\nstochastic gradient methods. In ECML PKDD, pages 585\u2013600, 2012.\n\n[2] W. W. Rouse Ball. Other questions on probability. Mathematical Recreations and Essays, page 45, 1960.\n\n[3] Paul Beame, Guy Van den Broeck, Eric Gribkoff, and Dan Suciu. Symmetric weighted \ufb01rst-order model\n\ncounting. In PODS, pages 313\u2013328, 2015.\n\n[4] Hung Hai Bui, Tuyen N Huynh, Arti\ufb01cial Intelligence Center, and Sebastian Riedel. Automorphism groups\n\nof graphical models and lifted variational inference. In UAI, page 132, 2013.\n\n[5] Jaesik Choi, Rodrigo de Salvo Braz, and Hung H. Bui. Ef\ufb01cient methods for lifted inference with aggregate\n\nfactors. In AAAI, 2011.\n\n[6] Nilesh Dalvi and Dan Suciu. Ef\ufb01cient query evaluation on probabilistic databases. The VLDB Journal,\n\n16(4):523\u2013544, 2007.\n\n[7] Luc De Raedt, Angelika Kimmig, and Hannu Toivonen. ProbLog: A probabilistic Prolog and its application\n\nin link discovery. In IJCAI, volume 7, 2007.\n\n[8] Luc De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. Statistical relational arti\ufb01cial\nintelligence: Logic, probability, and computation. Synthesis Lectures on Arti\ufb01cial Intelligence and Machine\nLearning, 10(2):1\u2013189, 2016.\n\n[9] Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted \ufb01rst-order probabilistic inference. In IJCAI, pages\n\n1319\u20131325, 2005.\n\n[10] Lise Getoor and Ben Taskar. Introduction to statistical relational learning. MIT press, 2007.\n\n[11] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In UAI, pages 256\u2013265, 2011.\n\n8\n\n\f[12] Manfred Jaeger. Relational Bayesian networks. In UAI. Morgan Kaufmann Publishers Inc., 1997.\n\n[13] Yacine Jernite, Alexander M Rush, and David Sontag. A fast variational approach for learning Markov\n\nrandom \ufb01eld language models. In ICML, 2015.\n\n[14] Abhay Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. Lifted inference seen from the other side:\n\nThe tractable features. In NIPS, pages 973\u2013981, 2010.\n\n[15] Seyed Mehran Kazemi and David Poole. Knowledge compilation for lifted probabilistic inference:\n\nCompiling to a low-level language. In KR, 2016.\n\n[16] Seyed Mehran Kazemi and David Poole. Why is compiling lifted inference into a low-level language so\n\neffective? arXiv preprint arXiv:1606.04512, 2016.\n\n[17] Kristian Kersting, Babak Ahmadi, and Sriraam Natarajan. Counting belief propagation. In UAI, pages\n\n277\u2013284, 2009.\n\n[18] Daphne Koller and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press,\n\nCambridge, MA, 2009.\n\n[19] Timothy Kopp, Parag Singla, and Henry Kautz. Lifted symmetry detection and breaking for MAP inference.\n\nIn NIPS, pages 1315\u20131323, 2015.\n\n[20] Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted\n\nprobabilistic inference with counting formulae. In AAAI, pages 1062\u20131068, 2008.\n\n[21] Mathias Niepert. Markov chains on orbits of permutation groups. In UAI, 2012.\n\n[22] David Poole, Fahiem Bacchus, and Jacek Kisynski. Towards completely lifted search-based probabilistic\n\ninference. arXiv:1107.4035 [cs.AI], 2011.\n\n[23] David Poole. First-order probabilistic inference. In IJCAI, pages 985\u2013991, 2003.\n\n[24] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62:107\u2013136, 2006.\n\n[25] Parag Singla and Pedro M Domingos. Lifted \ufb01rst-order belief propagation. In AAAI, volume 8, pages\n\n1094\u20131099, 2008.\n\n[26] Dan Suciu, Dan Olteanu, Christopher R\u00e9, and Christoph Koch. Probabilistic databases. Synthesis Lectures\n\non Data Management, 3(2):1\u2013180, 2011.\n\n[27] Nima Taghipour, Daan Fierens, Guy Van den Broeck, Jesse Davis, and Hendrik Blockeel. Completeness\n\nresults for lifted variable elimination. In AISTATS, pages 572\u2013580, 2013.\n\n[28] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted probabilistic\n\ninference by \ufb01rst-order knowledge compilation. In IJCAI, pages 2178\u20132185, 2011.\n\n[29] Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Lifted relax, compensate and then recover: From\n\napproximate to exact lifted probabilistic inference. In UAI, 2012.\n\n[30] Guy Van den Broeck, Wannes Meert, and Adnan Darwiche. Skolemization for weighted \ufb01rst-order model\n\ncounting. In KR, 2014.\n\n[31] Guy Van den Broeck. On the completeness of \ufb01rst-order knowledge compilation for lifted probabilistic\n\ninference. In NIPS, pages 1386\u20131394, 2011.\n\n[32] Jan Van Haaren, Guy Van den Broeck, Wannes Meert, and Jesse Davis. Lifted generative learning of\n\nMarkov logic networks. Machine Learning, pages 1\u201329, 2015.\n\n[33] Deepak Venugopal and Vibhav Gogate. Evidence-based clustering for scalable inference in Markov logic.\n\nIn ECML PKDD, pages 258\u2013273, 2014.\n\n[34] Deepak Venugopal and Vibhav G Gogate. Scaling-up importance sampling for Markov logic networks. In\n\nNIPS, pages 2978\u20132986, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1549, "authors": [{"given_name": "Seyed Mehran", "family_name": "Kazemi", "institution": "UBC"}, {"given_name": "Angelika", "family_name": "Kimmig", "institution": "KU Leuven"}, {"given_name": "Guy", "family_name": "Van den Broeck", "institution": "UCLA"}, {"given_name": "David", "family_name": "Poole", "institution": "UBC"}]}