{"title": "Lifted Inference Rules With Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 3519, "page_last": 3527, "abstract": "Lifted inference rules exploit symmetries for fast reasoning in statistical rela-tional models. Computational complexity of these rules is highly dependent onthe choice of the constraint language they operate on and therefore coming upwith the right kind of representation is critical to the success of lifted inference.In this paper, we propose a new constraint language, called setineq, which allowssubset, equality and inequality constraints, to represent substitutions over the vari-ables in the theory. Our constraint formulation is strictly more expressive thanexisting representations, yet easy to operate on. We reformulate the three mainlifting rules: decomposer, generalized binomial and the recently proposed singleoccurrence for MAP inference, to work with our constraint representation. Exper-iments on benchmark MLNs for exact and sampling based inference demonstratethe effectiveness of our approach over several other existing techniques.", "full_text": "Lifted Inference Rules with Constraints\n\nHappy Mittal, Anuj Mahajan\nDept. of Comp. Sci. & Engg.\n\nI.I.T. Delhi, Hauz Khas\nNew Delhi, 110016, India\n\nhappy.mittal@cse.iitd.ac.in,\nanujmahajan.iitd@gmail.com\n\nVibhav Gogate\n\nDept. of Comp. Sci.\nUniv. of Texas Dallas\n\nRichardson, TX 75080, USA\n\nParag Singla\n\nDept. of Comp. Sci. & Engg.\n\nI.I.T. Delhi, Hauz Khas\nNew Delhi, 110016, India\n\nvgogate@hlt.utdallas.edu\n\nparags@cse.iitd.ac.in\n\nAbstract\n\nLifted inference rules exploit symmetries for fast reasoning in statistical rela-\ntional models. Computational complexity of these rules is highly dependent on\nthe choice of the constraint language they operate on and therefore coming up\nwith the right kind of representation is critical to the success of lifted inference.\nIn this paper, we propose a new constraint language, called setineq, which allows\nsubset, equality and inequality constraints, to represent substitutions over the vari-\nables in the theory. Our constraint formulation is strictly more expressive than\nexisting representations, yet easy to operate on. We reformulate the three main\nlifting rules: decomposer, generalized binomial and the recently proposed single\noccurrence for MAP inference, to work with our constraint representation. Exper-\niments on benchmark MLNs for exact and sampling based inference demonstrate\nthe effectiveness of our approach over several other existing techniques.\n\nIntroduction\n\n1\nStatistical relational models such as Markov logic [5] have the power to represent the rich relational\nstructure as well as the underlying uncertainty, both of which are the characteristics of several real\nworld application domains. Inference in these models can be carried out using existing probabilistic\ninference techniques over the propositionalized theory (e.g., Belief propagation, MCMC sampling,\netc.). This approach can be sub-optimal since it ignores the rich underlying structure in the relational\nrepresentation, and as a result does not scale to even moderately sized domains in practice.\nLifted inference ameliorates the aforementioned problems by identifying indistinguishable atoms,\ngrouping them together and inferring directly over the groups instead of individual atoms. Starting\nwith the work of Poole [21], a number of lifted inference algorithms have been proposed. These\ninclude lifted exact inference techniques such as lifted Variable Elimination (VE) [3, 17], lifted\napproximate inference algorithms based on message passing such as belief propagation [23, 14, 24],\nlifted sampling based algorithms [26, 12], lifted search [11], lifted variational inference [2, 20] and\nlifted knowledge compilation [10, 6, 9]. There also has been some recent work which examines the\ncomplexity of lifted inference independent of the speci\ufb01c algorithm used [13, 2, 8].\nJust as probabilistic inference algorithms use various rules such as sum-out, conditioning and de-\ncomposition to exploit the problem structure, lifted inference algorithms use lifted inference rules\nto exploit the symmetries. All of them work with an underlying constraint representation that spec-\ni\ufb01es the allowed set of substitutions over variables appearing in the theory. Examples of various\nconstraint representations include weighted parfactors with constraints [3], normal form parfac-\ntors [17], hypercube based representations [24], tree based constraints [25] and the constraint free\nnormal form [13]. These formalisms differ from each other not only in terms of the underlying\nconstraint representation but also how these constraints are processed e.g., whether they require a\nconstraint solver, splitting as needed versus shattering [15], etc.\nThe choice of the underlying constraint language can have a signi\ufb01cant impact on the time as well as\nmemory complexity of the inference procedure [15], and coming up with the right kind of constraint\nrepresentation is of prime importance for the success of lifted inference techniques. Although, there\n\n1\n\n\fApproach\n\nLifted VE [4]\n\nCFOVE [17]\n\nGCFOVE [25]\n\nApprox. LBP [24]\n\nKnowledge Compilation\n(KC) [10, 7]\nLifted Inference\nfrom Other Side [13]\nPTP [11]\n\nCurrent Work\n\nConstraint\nType\neq/ineq\nno subset\neq/ineq\nno subset\nsubset (tree-based)\nno inequality\nsubset (hypercube)\nno inequality\neq/ineq\nsubset\nnormal forms\n(no constraints)\neq/ineq\nno subset\neq/ineq\nsubset\n\nConstraint\nAggregation\nintersection\nno union\nintersection\nno union\nintersection\nunion\nintersection\nunion\nintersection\nno union\nnone\n\nintersection\nno union\nintersection\nunion\n\nTractable Lifting\nSolver\nno\n\nAlgorithm\nlifted VE\n\nyes\n\nyes\n\nyes\n\nno\n\nyes\n\nno\n\nyes\n\nlifted VE\n\nlifted VE\n\nlifted message passing\n\n\ufb01rst-order knowledge\ncompilation\nlifting rules:\ndecomposer,binomial\nlifted search & sampling:\ndecomposer, binomial\nlifted search & sampling:\ndecomposer,binomial\nsingle occurrence\n\nTable 1: A comparison of constraint languages proposed in literature across four dimensions. The de\ufb01cien-\ncies/missing properties for each language have been highlighted in bold. Among the existing work, only KC\nallows for a full set of constraints. GCFOVE (tree-based) and LBP (hypercubes) allow for subset constraints\nbut they do not explicitly handle inequality. PTP does not handle subset constraints. For constraint aggregation,\nmost approaches allow only intersection of atomic constraints. GCFOVE and LBP allow union of intersections\n(DNF) but only deal with subset constraints. See footnote 4 in Broeck [7] regarding KC. Lifted VE, KC and\nPTP use a general purpose constraint solver which may not be tractable. Our approach allows for all the features\ndiscussed above and uses a tractable solver. We propose a constrained solution for lifted search and sampling.\nAmong earlier work, only PTP has looked at this problem (both search and sampling). However, it only allows\na very restrictive set of constraints.\n\nhas been some work studying this problem in the context of lifted VE [25], lifted BP [24], and lifted\nknowledge compilation [10], existing literature lacks any systematic treatment of this issue in the\ncontext of lifted search and sampling based algorithms. This paper focuses on addressing this issue.\nTable 1 presents a detailed comparison of various constraint languages for lifted inference to date.\nWe make the following contributions. First, we propose a new constraint language called setineq,\nwhich allows for subset (i.e., allowed values are constrained to be either inside a subset or outside\na subset), equality and inequality constraints (called atomic constraints) over substitutions of the\nvariables. The set of allowed constraints is expressed as a union over individual constraint tuples,\nwhich in turn are conjunctions over atomic constraints. Our constraint language strictly subsumes\nseveral of the existing constraint representations and yet allows for ef\ufb01cient constraint processing,\nand more importantly does not require a separate constraint solver. Second, we extend the three main\nlifted inference rules: decomposer and binomial [13], and single occurrence [18] for MAP inference,\nto work with our proposed constraint language. We provide a detailed analysis of the lifted inference\nrules in our constraint formalism and formally prove that the normal form representation is strictly\nsubsumed by our constraint formalism. Third, we show that evidence can be ef\ufb01ciently represented\nin our constraint formulation and is a key bene\ufb01t of our approach. Speci\ufb01cally, based on the earlier\nwork of Singla et al. [24], we provide an ef\ufb01cient (greedy) approach to convert the given evidence\nin the database tuple form to our constraint representation. Finally, we demonstrate experimentally\nthat our new approach is superior to normal forms as well as many other existing approaches on\nseveral benchmark MLNs for both exact and approximate inference.\n2 Markov Logic\nWe will use a strict subset of \ufb01rst order logic [22], which is composed of constant, variable,\nand predicate symbols. A term is a variable or a constant. A predicate represents a property\nof or relation between terms, and takes a \ufb01nite number of terms as arguments. A literal is a\npredicate or its negation. A formula is recursively de\ufb01ned as follows: (1) a literal is a formula,\n(2) negation of a formula is a formula, (3) if f1 and f2 are formulas then applying binary logical\noperators such as \u2227 and \u2228 to f1 and f2 yields a formula and (4) If x is a variable in a formula f,\nthen \u2203x f and \u2200x f are formulas. A \ufb01rst order theory (knowledge base (KB)) is a set of quanti\ufb01ed\nformulas. We will restrict our attention to function-free \ufb01nite \ufb01rst order logic theory with Herbrand\ninterpretations [22], as done by most earlier work in this domain [5]. We will also restrict our\n\n2\n\n\fZ exp((cid:80)m\n\nattention to the case of universally quanti\ufb01ed variables. A ground atom is a predicate whose terms\ndo not contain any variable in them. Similarly, a ground formula is a formula that has no variables.\nDuring the grounding of a theory, each formula is replaced by a conjunction over ground formulas\nobtained by substituting the universally quanti\ufb01ed variables by constants appearing in the theory.\nA Markov logic network (MLN) [5] (or a Markov logic theory) is de\ufb01ned as a set of pairs {fi, wi}m\ni=1\nwhere fi is a \ufb01rst-order formula and wi is its weight, a real number. Given a \ufb01nite set of constants\nC, a Markov logic theory represents a Markov network that has one node for every ground atom\nin the theory and a feature for every ground formula. The probability distribution represented by\nthe Markov network is given by P (\u03b8) = 1\ni=1 wini(\u03b8)), where ni(\u03b8) denotes the number\nof true groundings of the ith formula under the assignment \u03b8 to the ground atoms (world) and\ni=1 wi \u2217 ni(\u03b8(cid:48)))) is the normalization constant, called the partition function. It is\nwell known that prototypical marginal inference task in MLNs \u2013 computing the marginal probability\nof a ground atom given evidence \u2013 can be reduced to computing the partition function [11]. Another\nkey inference task is MAP inference in which the goal is to \ufb01nd an assignment to ground atoms that\nhas the maximum probability.\nIn its standard form, a Markov logic theory is assumed to be constraint free i.e. all possible sub-\nstitutions of variables by constants are considered during the grounding process. In this paper, we\nintroduce the notion of a constrained Markov logic theory which is speci\ufb01ed as a set of triplets\n{fi, wi, Sx\ni speci\ufb01es a set (union) of constraints de\ufb01ned over the variables x appear-\ning in the formula. During the grounding process, we restrict to those constant substitutions which\nsatisfy the constraint set associated with a formula. The probability distribution is now de\ufb01ned\nusing the restricted set of groundings allowed by the respective constraint sets over the formulas in\nthe theory. Although, we focus on MLNs in this paper, our results can be easily generalized to other\nrepresentations including weighted parfactors [3] and probabilistic knowledge bases [11].\n\ni=1 where Sx\n\nZ =(cid:80)\n\n\u03b8(cid:48) exp((cid:80)m\n\ni }m\n\n3 Constraint Language\n\nIn this section, we formally de\ufb01ne our constraint language and its canonical form. We also de\ufb01ne\ntwo operators, join and project, for our language. The various features, operators, and properties of\nthe constraint language presented this section will be useful when we formally extend various lifted\ninference rules to the constrained Markov logic theory in the next section (sec. 4).\nLanguage Speci\ufb01cation. For simplicity of exposition, we assume that all logical variables take val-\nues from the same domain C. Let x = {x1, x2, . . . , xk} be a set of logical variables. Our constraint\nlanguage called setineq contains three types of atomic constraints: (1) Subset Constraints (setct),\nof the form xi \u2208 C (setinct), or xi /\u2208 C (setoutct); (2) equality constraints (eqct), of the form\nxi = xj; and (3) inequality constraints (ineqct), of the form xi (cid:54)= xj. We will denote an atomic\nconstraint over set x by Ax. A constraint tuple over x, denoted by T x, is a conjunction of atomic\nconstraints over x, and a constraint set over x, denoted by Sx, is a disjunction of constraint tuples\nover x. An example of a constraint set over a pair of variables x = {x1, x2} is Sx = T x\n2 , where\n2 = [x1 /\u2208 {A, B}\u2227x1 = x2\u2227x2 \u2208 {B, D}].\n1 = [x1 \u2208 {A, B}\u2227x1 (cid:54)= x2\u2227x2 \u2208 {B, D}], and T x\nT x\nAn assignment v to the variables in x is a solution of T x if all constraints in T x are satis\ufb01ed by v.\nSince Sx is a disjunction, by de\ufb01nition, v is also a solution of Sx.\nNext, we de\ufb01ne a canonical representation for our constraint language. We require this de\ufb01nition\nbecause symmetries can be easily identi\ufb01ed when constraints are expressed in this representation.\nWe begin with some required de\ufb01nitions. The support of a subset constraint is the set of values\nin C that satis\ufb01es the constraint. Two subset constraints Ax1 and Ax2 are called value identical\nif V1 = V2, and value disjoint if V1 \u2229 V2 = \u03c6, where V1 and V2 are supports of Ax1 and Ax2\nrespectively. A constraint tuple T x is transitive over equality if it contains the transitive closure of\nall its equality constraints. A constraint tuple T x is transitive over inequality if for every constraint\nof the form xi = xj in T x, whenever T x contains xi (cid:54)= xk, it also contains xj (cid:54)= xk.\nDe\ufb01nition 3.1. A constraint tuple T x is in canonical form if the following three conditions are\nsatis\ufb01ed: (1) for each variable xi \u2208 x, there is exactly one subset constraint in T x, (2) all equality\nand inequality constraints in T x are transitive and (3) all pairs of variables x1, x2 that participate\neither in an equality or an inequality constraint have identical supports. A constraint set Sx is in\ncanonical form if all of its constituent constraint tuples are in canonical form.\n\n1 \u2228T x\n\n3\n\n\fWe can easily express a constraint set in an equivalent canonical form by enforcing the three condi-\n1 can be converted into canonical\ntions, one by one on each of its tuples. In our running example, T x\nform by splitting it into four sets of constraint tuples {T x\n11 = [x1 \u2208\n13, T x\n12, T x\n13 = [x1 \u2208 {A} \u2227 x2 \u2208 {B}],\n{B} \u2227 x1 (cid:54)= x2 \u2227 x2 \u2208 {B}], T x\n12 = [x1 \u2208 {B} \u2227 x2 \u2208 {D}], T x\n2 . We include the conversion algorithm in the\nand T x\nsupplement due to lack of space. The following theorem summarizes its time complexity.\nTheorem 3.1.* Given a constraint set Sx, each constraint tuple T x in it can be converted to canon-\nical form in time O(mk + k3) where m is the total number of constants appearing in any of the\nsubset constraints in T x and k is the number of variables in x.\n\n14 = [x1 \u2208 {A} \u2227 x2 \u2208 {D}]. Similarly for T x\n\n11, T x\n\n14}, where T x\n\n1\n\n1 in our running example and T y = [x1 (cid:54)= y \u2227 y \u2208 {E, F}], T x\n\nWe de\ufb01ne following two operations in our constraint language.\nJoin: Join operation lets us combine a set of constraints (possibly de\ufb01ned over different sets of\nvariables) into a single constraint. It will be useful when constructing formulas given constrained\npredicates (refer Section 4). Let T x and T y be constraints tuples over sets of variables x and y,\nrespectively, and let z = x \u222a y. The join operation written as T x (cid:111)(cid:110) T y results in a constraint\ntuple T z which has the conjunction of all the constraints present in T x and T y. Given the constraint\n(cid:111)(cid:110) T y results in [x1 \u2208\ntuple T x\n{A, B} \u2227 x1 (cid:54)= x2 \u2227 x1 (cid:54)= y \u2227 x2 \u2208 {B, D} \u2227 y \u2208 {E, F}]. The complexity of join operation is\nlinear in the size of constraint tuples being joined.\nProject: Project operation lets us eliminate a variable from a given constraint tuple. This is key\noperation required in the application of Binomial rule (refer Section 4). Let T x be a constraint tuple.\nGiven xi \u2208 x, let \u00afxi = x \\ {xi}. The project operation written as \u03a0 \u00afxiT x results in a constraint\ntuple T \u00afxi which contains those constraints in T x not involving xi. We refer to T \u00afxi as the projected\nconstraint for the variables \u00afxi. Given a solution \u00afxi = \u00afvi to T \u00afxi, the extension count for \u00afvi is\nde\ufb01ned as the number of unique assignments xi = vi such that \u00afxi = \u00afvi,xi = vi is a solution for T x.\nT \u00afxi is said to be count preserving if each of its solutions has the same extension count. We require\na tuple to be count preserving in order to correctly maintain the count of the number of solutions\nduring the project operation (also refer Section 4.3).\nLemma 3.1. * Let T x be a constraint tuple in its canonical form. If xi \u2208 x is a variable which is\neither involved only in a subset constraint or is involved in at least one equality constraint then, the\nprojected constraint T \u00afxi is count preserving. In the former case, the extension count is given by the\nsize of the support of xi. In the latter case, it is equal to 1.\n\nWhen dealing with inequality constraints, the extension count for each solution \u00afvi to the projected\nconstraint T \u00afxi may not be the same and we need to split the constraint \ufb01rst in order to apply the\nproject operation. For example, consider the constraint [x1 (cid:54)= x2 \u2227 x1 (cid:54)= x3 \u2227 x1, x2, x3 \u2208\n{A, B, C}]. Then, the extension count for the solution x2 = A, x3 = B to the projected con-\nstraint T \u00afx1 is 1 where extension count for the solution x2 = x3 = A is 2. In such cases, we need to\nsplit the tuple T x into multiple constraints such that extension count property is preserved in each\nsplit. Let \u00afxi be a set of variables over which a constraint tuple T x needs to be projected. Let y \u2282 x\nbe the set of variables with which xi is involved in an inequality constraint in T x. Then, tuple T x\ncan be broken into an equivalent constraint set by considering each possible division of y into a set\nof equivalence classes where variables in the same equivalence class are constrained to be equal and\nvariables in different equivalence classes are constrained to be not equal to each other. The num-\nber of such divisions is given by the Bell number [15]. The divisions inconsistent with the already\nexisting constraints over variables in y can be ignored. Projection operation has a linear time com-\nplexity once the extension count property has been ensured using splitting as described above (see\nthe supplement for details).\n4 Extending Lifted Inference Rules\nWe extend three key lifted inference rules: decomposer [13], binomial [13] and the single occur-\nrence [18] (for MAP) to work with our constraint formulation. Exposition for Single Occurrence\nhas been moved to supplement due to lack of space. We begin by describing some important def-\ninitions and assumptions. Let M be a constrained MLN theory represented by a set of triplets\n{(fi, wi, Sx\nis\nspeci\ufb01ed using setineq and is in canonical form. Second, we assume that each formula in the MLN\nis constant free. This can be achieved by replacing the appearance of a constant by a variable and\nintroducing appropriate constraint over the new variable (e.g., replacing A by a variable x and a\n\ni=1. We make three assumptions. First, we assume that each constraint set Sx\ni\n\ni )}m\n\n4\n\n\fconstraint x \u2208 {A}). Third, we assume that the variables have been standardized apart, i.e., each\nformula has a unique set of variables associated with it. In the following, x will denote the set of\nall the (logical) variables appearing in M. xi will denote the set of variables in fi. Similar to the\nwork done earlier [13, 18], we divide the variables in a set of equivalence classes. Two variables are\nTied to each other if they appear as the same argument of a predicate. We take the transitive clo-\nsure of the Tied relation to obtain the variable equivalence classes. For example, given the theory:\nP (x) \u21d2 Q(x, y); Q(u, v) \u21d2 R(v); R(w) \u21d2 T (w, z), the variable equivalence classes are {x, u},\n{y, v, w} and {z}. We will use the notation \u02c6x to denote the equivalence class to which x belongs.\n4.1 Motivation and Key Operations\n\nThe key intuition behind our approach is as follows. Let x be a variable appearing in a formula\nfi. Let T xi be an associated constraint tuple and V denote the support for x in T xi. Then, since\nconstraints are in canonical form, for any other variable x(cid:48) \u2208 xi involved in (in)equality constraint\nwith x with V (cid:48) as the support, we have V = V (cid:48)Therefore, every pair of values vi, vj \u2208 V behave\nidentically with respect to the constraint tuple T xi and hence, are symmetric to each other. Now, we\ncould extend this notion to other constraints in which x appears provided the support sets {Vl}r\nl=1\nof x in all such constraints are either identical or disjoint. We could treat each support set Vl for x as\na symmetric group of constants which could be argued about in unison. In an unconstrained theory,\nthere is a single disjoint partition of constants i.e. the entire domain, such that the constants behave\nidentically. Our approach generalizes this idea to a groups of constants which behave identically\nwith each other. Towards this end, we de\ufb01ne following 2 key operations over the theory which will\nbe used over and again during application of lifted inference rules.\nPartitioning Operation: We require the support sets of a variable (or sets of variables) over which\nlifted rule is being applied to be either identical or disjoint. We say that a theory M de\ufb01ned over a\nset of (logical) variables x is partitioned with respect to the variables in the set y \u2286 x if for every\npair of subset constraints Ax1 and Ax2, x1, x2 \u2208 y appearing in tuples of Sx the supports of Ax1\nand Ax2 are either identical or disjoint (but not both). Given a partitioned theory with respect to\nvariables y, we use V y = {V y\nl }r\nl=1 to denote the set of various supports of variables in y. We refer\nto the set V y as the partition of y values in M. Our partitioning algorithm considers all the support\nsets for variables in y and splits them such that all the splits are identical or disjoint. The constraint\ntuples can then be split and represented in terms of these \ufb01ne-grained support sets. We refer the\nreader to the supplement section for a detailed description of our partitioning algorithm.\nRestriction Operation: Once the values of a set of variables y have been partitioned into a set\n{V y}r\nl=1, while applying the lifted inference rules, we will often need to argue about those formula\ngroundings which are obtained by restricting y values to those in a particular set V y\n(since values\nin each such support set behave identically to each other). Given x \u2208 y, let Ax\nl\nl denote a subset\nl as its support. Given a formula fi we de\ufb01ne its restriction to the set\nconstraint over x with V y\nV y\nl as the formula obtained by replacing its associated constraint tuple T xi with a new constraint\nl where the conjunction is taken over each variable xj \u2208 y which also\nappears in fi. The restriction of an MLN M to the set Vl, denoted by M y\nl , is the MLN obtained\nby restricting each formula in M to the set Vl. Restriction operation can be implemented in a\nstraightforward manner by taking conjunction with the subset constraints having the desired support\nset for variables in y. We next de\ufb01ne the formulation of our lifting rules in a constrained theory.\n\ntuple of the form T xi(cid:86)\n\nj Axj\n\n4.2 Decomposer\n\nLet M be an MLN theory. Let x denote the set of variables appearing in M. Let Z(M ) denotes the\npartition function for M. We say that an equivalence class \u02c6x is a decomposer [13] of M if a) if x \u2208 \u02c6x\noccurs in a formula f \u2208 F , then x appears in every predicate in f and b) If xi, xj \u2208 \u02c6x, then xi, xj\ndo not appear as different arguments of any predicate P . Let \u02c6x be a decomposer for M. Let M(cid:48)\nbe a new theory in which the domain of all the variables belonging to equivalence class \u02c6x has been\nreduced to a single constant. The decomposer rule [13] states that the partition function Z(M ) can\nbe re-written using Z(M(cid:48)) as Z(M ) = (Z(M(cid:48)))m, where m = |Dom(\u02c6x)| in M. The proof follows\nfrom the fact that since \u02c6x is a decomposer, the theory can be decomposed into m independent but\nidentical (up to the renaming of a constant) theories which do not share any random variables [13].\nNext, we will extend the decomposer rule above to work with the constrained theories. We will\nassume that the theory has been partitioned with respect to the set of variables appearing in the\n\n5\n\n\fdecomposer \u02c6x. Let the partition of \u02c6x values in M be given by V \u02c6x = {V \u02c6x\nl }r\ndecomposer rule for a constrained theory using the following theorem.\nTheorem 4.1. * Let M be a partitioned theory with respect to the decomposer \u02c6x. Let M \u02c6x\nl denote\nto a singleton {v}\nthe restriction of M to the partition element V \u02c6x\nwhere v \u2208 V \u02c6x is some element in the set V \u02c6x. Then, the partition function Z(M ) can be written as\nZ(M ) = \u03a0r\n\nl=1. Now, we de\ufb01ne the\n\nfurther restricts M \u02c6x\nl\n\nl . Let M(cid:48)\u02c6x\n\nl=1Z(M(cid:48)\u02c6x\n\nl )|V \u02c6x\nl |\n\nl=1Z(M \u02c6x\n\nl ) = \u03a0r\n\nl\n\nk=0\n\nk\n\ni of f t\n\n(cid:0)n\n\nk\n\ni or f f\n\ni and f f\n\ni=1 in f t\n\ni=k+1 in f f\n\n(cid:1)(cid:0)Z(M P\n\n4.3 Binomial\nLet M be an unconstrained MLN theory and P be a unary predicate. Let xj denote the set of\nvariables appearing as \ufb01rst argument of P . Let Dom(xj) = {ci}n\nk be the\ntheory obtained from M as follows. Given a formula fi with weight wi in which P appears, wlog\nlet xj denote the argument of P in fi. Then, for every such formula fi, we replace it by two new\ni , obtained by a) substituting true and f alse for the occurrence of P (xj) in fi,\nformulas, f t\ni and\nrespectively, and b) when xj occurs in f t\n{ci}n\ni is equal to wi if it has an occurrence\nof xj, wi \u2217 k otherwise. Similarly, for f f\ni . The Binomial rule [13] states that the partition function\n\nk ))(cid:1). The proof follows from the fact that\nZ(M ) can be written as: Z(M ) = (cid:80)n\ncalculation of Z can be divided into n + 1 cases, where each case corresponds to considering(cid:0)n\n(cid:1)\n\ni where n = |Dom(xj)|. The weight wt\n\ni , reducing the domain of xj to {ci}k\n\ni=1,\u2200xj \u2208 xj. Let M P\n\nequivalent possibilities for k number of P groundings being true and n \u2212 k being false, k ranging\nfrom 0 to n.\nNext, we extend the above rule for a constrained theory M. Let P be singleton predicate and xj be\nset of variables appearing as \ufb01rst arguments of P as before. Let M be partitioned with respect to\nxj and V xj = {V xj\nl=1 denote the partition of xj values in M. Let F P denote the set of formulas\nin which P appears. For every formula fi \u2208 F P in which xj appears only in P (xj), assume that\nthe projections over the set \u00afxj are count preserving. Then, we obtain a new MLN M P\nl,k from M\nin the following manner. Given a formula fi \u2208 F P with weight wi in which P appears, do the\nl } for variable xj 2) for the remaining\nfollowing steps 1) restrict fi to the set of values {v|v /\u2208 V xj\ntuples (i.e. where xj takes the values from the set V xj\ni obtained\n}, respectively. Here,\n, . . . , V xj\nby restricting f t\nlnl\nthe subscript nl = |V xj\ni 4) Substitute true and f alse\nfor P in f t\ni is equal\nid is count\nto wi, otherwise split f t\nid is wi \u2217 et\n. Similarly, for f f\npreserving with extension count given by et\ni .\nld\nWe are now ready to de\ufb01ne the Binomial formulation for a constrained theory:\nTheorem 4.2. * Let M be an MLN theory partitioned with respect to variable xj. Let P (xj) be\na singleton predicate. Let the projections T \u00afxj of tuples associated with the formulas in which xj\nappears only in P (xj) be count preserving. Let V xj = {V xj\nl=1 denote the partition of xj values\nin M and let nl = |V xj\n|. Then, the partition function Z(M ) can be computed using the recursive\napplication of the following rule for each l:\n\n| 3) Canonicalize the constraints in f t\ni into {f t\n\ni (after the substitution), its weight wt\n}D\nd=1 such that projection over \u00afxj in each tuple of f t\n\ni respectively 5) If xj appears in f t\n\n), create two new formulas f t\n\ni to the set {V xj\n\n. The weight of each f t\n\nto the set {V xj\n\nlk+1\ni and f f\n\n} and f f\n\n, . . . V xj\nlk\n\ni and f f\n\ni and f f\n\nl\n\nl\n\nl1\n\nid\n\nl }r\n\nld\n\nl\n\ni\n\n6\n\nl }r\n\nnl(cid:88)\n\n(cid:18)nl\n\n(cid:19)(cid:0)Z(M P\nl,k))(cid:1)\n\nk\n\nZ(M ) =\n\nk=0\n\npletely from the theory. The Binomial application as described above involves(cid:81)r\n\nWe apply Theorem 4.2 recursively for each partition component in turn to eliminate P (xj) com-\nl=1(nl + 1) com-\nl nl computations (two possi-\n\nputations of Z whereas a direct grounding method would involve 2\nbilities for each grounding of P (xj) in turn). See the supplement for an example.\n4.4 Normal Forms and Evidence Processing\nNormal Forms: Normal form representation [13] is an unconstrained representation which requires\nthat a) there are no constants in any formula fl \u2208 F b) the domain of variables belonging to an\nequivalence class \u02c6x are identical to each other. An (unconstrained) MLN theory with evidence can\nbe converted into normal form by a series of mechanical operations in time polynomial in the size\n\n(cid:80)\n\n\fDomain\n\nSource\n\nFriends &\nSmokers (FS)\nWebKB\n\nIMDB\n\nRules\nSmokes(p) \u21d2 Cancer(p); Smokes(p1)\n\u2227 Friends(p1,p2) \u21d2 Smokes(p2)\nPageClass(p1,+c1) \u2227 PageClass(p2,+c2)\n\nAlchemy\n[5]\nAlchemy\n[25],[24] \u21d2 Links(p1,p2)\nAlchemy Director(p) \u21d2 !WorksFor(p1,p2)\n[16]\n\nActor(p) \u21d2 !Director(p); Movie(m,p1)\n\u2227 WorksFor(p1,p2) \u21d2 Movie(m,p2)\n\nType (#\nof const.)\nperson (var)\n\npage (271)\nclass (5)\nperson(278)\nmovie (20)\n\nEvidence\n\nSmokes\nCancer\nPageClass\n\nActor\nDirector\nMovie\n\nTable 2: Dataset Details. var: domain size varied. \u2019+\u2019: a separate weight learned for each grounding\n\nof the theory and the evidence [13, 18]. Any variable values appearing as a constant in a formula\nor in evidence is split apart from the rest of the domain and a new variable with singleton domain\ncreated for them. Constrained theories can be normalized in a similar manner by 1) splitting apart\nthose variables appearing any subset constraints. 2) simple variable substitution for equality and 3)\nintroducing explicit evidence predicates for inequality. We can now state the following theorem.\nTheorem 4.3. * Let M be a constrained MLN theory. The application of the modi\ufb01ed lifting rules\nover this constrained theory can be exponentially more ef\ufb01cient than \ufb01rst converting the theory in\nthe normal form and then applying the original formulation of the lifting rules.\n\nj \u222a Ef\n\nj can be obtained by eliminating the set Et\n\nj is implicitly speci\ufb01ed. The \ufb01rst step in processing evidence is to convert the sets Et\n\nEvidence Processing: Given a predicate Pj(x1, . . . , xk) let Ej denote its associated evidence. Fur-\nj (Ef\nther, Et\nj ) denote the set of ground atoms of Pj which are assigned true (f alse) in evidence.\nLet Eu\nj denote the set of groundings which are unknown (neither true nor f alse.) Note that the set\nj and Ef\nEu\nj\ninto the constraint representation form for every predicate Pj. This is done by using the hypercube\nrepresentation [24] over the set of variables appearing in predicate Pj. A hypercube over a set of\nvariables can be seen as a constraint tuple specifying a subset constraint over each variable in the\nset. A union of hypercubes represents a constraint set representing the union of corresponding con-\nstraint tuples. Finding a minimal hypercube decomposition in NP-hard and we employ the greedy\ntop-down hypercube construction algorithm as proposed Singla et al. [24] (Algorithm 2). The con-\nstraint representation for the implicit set Eu\nj from\nits bounding hypercube (i.e. one which includes all the groundings in the set) and then calling the\nhypercube construction algorithm over the remaining set. Once the constraint representation has\nbeen created for every set of evidence (and non-evidence) atoms, we join them together to obtain\nthe constrained representation. The join over constraints is implemented as described in Section 3.\n5 Experiments\nIn our experiments, we compared the performance of our constrained formulation of lifting rules\nwith the normal forms for the task of calculating the partition function Z. We refer to our approach\nas SetInEq and normal forms as Normal. We also compared with PTP [11] available in Alchemy\n1 Both our systems and GCFOVE are implemented in Java. PTP\n2 and GCFVOE [25] system.\nis implemented in C++. We experimented on four benchmark MLN domains for calculating the\npartition function using exact as well as approximate inference. Table 2 shows the details of our\ndatasets. Details for one of the domains Professor and Students (PS) [11] are presented in supple-\nment due to lack of space. Evidence was the only type of constraint considered in our experiments.\nThe experiments on all the datasets except WebKB were carried on a machine with 2.20GHz Intel\nCore i3 CPU and 4GB RAM. WebKB is a much larger dataset and we ran the experiments on 2.20\nGHz Xeon(R) E5-2660 v2 server with 10 cores and 128 GB RAM.\n5.1 Exact Inference\nWe compared the performance of the various algorithms using exact inference on two of the do-\nmains: FS and PS. We do not compare the value of Z since we are dealing with exact inference\nIn the following, r% evidence on a type means that r% of the constants of the type are randomly\nselected and evidence predicate groundings in which these constants appear are randomly set to true\nor false. Remaining evidence groundings are set to unknown. y-axis is plotted on log scale in the\nfollowing 3 graphs. Figure 1a shows the results as the domain size of person is varied from 100 to\n800 with 40% evidence in the FS domain. We timed out an algorithm after 1 hour. PTP failed to\n\n1Alchemy-2:code.google.com/p-alchemy-2,GCFOVE: https:dtai.cs.kuleuven.be/software/gcfove\n\n7\n\n\fscale to even 100 size and are not shown in the \ufb01gure. The time taken by Normal grows very fast and\nit times out after 500 size. SetInEq and GCFOVE have a much slower growth rate. SetInEq is about\nan order of magnitude faster than GCFVOE on all domain sizes. Figure 1b shows the time taken\nby the three algorithms as we vary the evidence on person with a \ufb01xed domain size of 500. For all\nthe algorithms, the time \ufb01rst increases with evidence and then drops. SetInEq is up to an order of\nmagnitude faster than GCFVOE and upto 3 orders of magnitude faster than Normal. Figure 1c plots\nthe number of nodes expanded by Normal and SetInEQ. GCFOVE code did not provide any such\nequivalent value. As expected, we see much larger growth rate for Normal compared to SetInEq.\n\n(a) FS: size vs time (sec).\n\n(b) FS: evidence vs time (sec)\n\n(c) FS: size vs # nodes expanded\n\nFigure 1: Results for exact inference on FS\n\n5.2 Approximate Inference\nFor approximate inference, we could only compare Nor-\nmal with SetInEq. GCFOVE does not have an approxi-\nmate variant for computing marginals or partition func-\ntion. PTP using importance sampling is not fully im-\nplemented in Alchemy 2. For approximate inference in\nboth Normal and SetInEq, we used the unbiased impor-\ntance sampling scheme as described by Gogate & Domin-\ngos [11]. We collected a total of 1000 samples for each\nestimate and averaged the Z values.\nIn all our experi-\nments below, the log(Z) values calculated by the two al-\ngorithms were within 1% of each other hence, the esti-\nmates are comparable with other. We compared the per-\nformance of the two algorithms on two real world datasets\nIMDB and WebKB (see Table 2). For WebKB, we exper-\nimented with 5 most frequent page classes in Univ. of\nTexas fold.\nIt had close to 2.5 million ground clauses.\nIMDB has 5 equal sized folds with close to 15K ground-\nings in each. The results presented are averaged over the\nfolds. Figure 2a (y-axis on log scale) shows the time taken\nby two algorithms as we vary the subset of pages in our\ndata from 0 to 270. The scaling behavior is similar to as\nobserved earlier for datasets. Figure 2b plots the timing of\nthe two algorithms as we vary the evidence % on IMDB.\nSetInEq is able to exploit symmetries with increasing ev-\nidence whereas Normal\u2019s performance degrades.\n6 Conclusion and Future work\nIn this paper, we proposed a new constraint language called SetInEq for relational probabilistic mod-\nels. Our constraint formalism subsumes most existing formalisms. We de\ufb01ned ef\ufb01cient operations\nover our language using a canonical form representation and extended 3 key lifting rules i.e., de-\ncomposer, binomial and single occurrence to work with our constraint formalism. Experiments on\nbenchmark MLNs validate the ef\ufb01cacy of our approach. Directions for future work include exploit-\ning our constraint formalism to facilitate approximate lifting of the theory.\n7 Acknowledgements\nHappy Mittal was supported by TCS Research Scholar Program. Vibhav Gogate was partially sup-\nported by the DARPA Probabilistic Programming for Advanced Machine Learning Program under\nAFRL prime contract number FA8750-14-C-0005. Parag Singla is being supported by Google travel\ngrant to attend the conference. We thank Somdeb Sarkhel for helpful discussions.\n\nFigure 2: Results using approximate in-\nference on WebKB and IMDB\n\n(a) WebKB: size vs time (sec)\n\n(b) IMDB: evidence % vs time (sec)\n\n8\n\n 1 10 100 1000 10000 100 200 300 400 500 600 700 800Time (in seconds)Domain SizeSetInEqNormalGCFOVE 1 10 100 1000 10000 0 20 40 60 80 100Time (in seconds)Evidence %SetInEqNormalGCFOVE 1000 10000 100000 1e+06 1e+07 1e+08 100 200 300 400 500 600 700 800No. of nodesDomain SizeSetInEqNormal 100 1000 10000 100000 50 100 150 200 250 300Time (in seconds)Domain SizeSetInEqNormal 0 100 200 300 400 500 600 700 0 20 40 60 80 100Time (in seconds)Evidence %SetInEqNormal\fReferences\n[1] Udi Apsel, Kristian Kersting, and Martin Mladenov. Lifting relational map-lps using cluster signatures.\n\nIn Proc. of AAAI-14, pages 2403\u20132409, 2014.\n\n[2] H. Bui, T. Huynh, and S. Riedel. Automorphism groups of graphical models and lifted variational infer-\n\nence. In Proc. of UAI-13, pages 132\u2013141, 2013.\n\n[3] R. de Salvo Braz, E. Amir, and D. Roth. Lifted \ufb01rst-order probabilistic inference. In Proc. of IJCAI-05,\n\npages 1319\u20131325, 2005.\n\n[4] R. de Salvo Braz, E. Amir, and D. Roth. Lifted \ufb01rst-order probabilistic inference.\n\nB. Taskar, editors, Introduction to Statistical Relational Learning. MIT Press, 2007.\n\nIn L. Getoor and\n\n[5] P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Arti\ufb01cial Intelligence. Synthesis\n\nLectures on Arti\ufb01cial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009.\n\n[6] G. Van den Broeck. On the completeness of \ufb01rst-order knowledge compilation for lifted probabilistic\n\ninference. In Proc. of NIPS-11, pages 1386\u20131394, 2011.\n\n[7] G. Van den Broeck. Lifted Inference and Learning in Statistical Relational Models. PhD thesis, KU\n\nLeuven, 2013.\n\n[8] G. Van den Broeck. On the complexity and approximation of binary evidence in lifted inference. In Proc.\n\nof NIPS-13, 2013.\n\n[9] G. Van den Broeck and J. Davis. Conditioning in \ufb01rsr-order knowledge compilation and lifted probabilis-\n\ntic inference. In Proc. of AAAI-12, 2012.\n\n[10] G. Van den Broeck, N. Taghipour, W. Meert, J. Davis, and L. De Raedt. Lifted probabilistic inference by\n\n\ufb01rst-order knowledge compilation. In Proc. of IJCAI-11, 2011.\n\n[11] V. Gogate and P. Domingos. Probabilisitic theorem proving. In Proc. of UAI-11, pages 256\u2013265, 2011.\n[12] V. Gogate, A. Jha, and D. Venugopal. Advances in lifted importance sampling. In Proc. of AAAI-12,\n\npages 1910\u20131916, 2012.\n\n[13] A. K. Jha, V. Gogate, A. Meliou, and D. Suciu. Lifted inference seen from the other side : The tractable\n\nfeatures. In Proc. of NIPS-10, pages 973\u2013981, 2010.\n\n[14] K. Kersting, B. Ahmadi, and S. Natarajan. Counting belief propagation.\n\n277\u2013284, 2009.\n\nIn Proc. of UAI-09, pages\n\n[15] J. Kisy\u00b4nski and D. Poole. Constraint processing in lifted probabilistic inference.\n\n2009.\n\nIn Proc. of UAI-09,\n\n[16] L. Mihalkova and R. Mooney. Bottom-up learning of Markov logic network structure. In Proceedings of\n\nthe Twenty-Forth International Conference on Machine Learning, pages 625\u2013632, 2007.\n\n[17] B. Milch, L. S. Zettlemoyer, K. Kersting, M. Haimes, and L. P. Kaebling. Lifted probabilistic inference\n\nwith counting formulas. In Proc. of AAAI-08, 2008.\n\n[18] H. Mittal, P. Goyal, V. Gogate, and P. Singla. New rules for domain independent lifted MAP inference.\n\nIn Proc. of NIPS-14, pages 649\u2013657, 2014.\n\n[19] M. Mladenov, A. Globerson, and K. Kersting. Lifted message passing as reparametrization of graphical\n\nmodels. In Proc. of UAI-14, pages 603\u2013612, 2014.\n\n[20] M. Mladenov and K. Kersting. Equitable partitions of concave free energies. In Proc. of UAI-15, 2015.\n[21] D. Poole. First-order probabilistic inference. In Proc. of IJCAI-03, pages 985\u2013991, 2003.\n[22] S. J. Russell and P. Norvig. Arti\ufb01cial Intelligence - A Modern Approach (3rd edition). Pearson Education,\n\n2010.\n\n[23] P. Singla and P. Domingos. Lifted \ufb01rst-order belief propagation. In Proc. of AAAI-08, pages 1094\u20131099,\n\n2008.\n\n[24] P. Singla, A. Nath, and P. Domingos. Approximate lifted belief propagation. In Proc. of AAAI-14, pages\n\n2497\u20132504, 2014.\n\n[25] N. Taghipour, D. Fierens, J. Davis, and H. Blockeel. Lifted variable elimination with arbitrary constraints.\n\nIn Proc. of AISTATS-12, Canary Islands, Spain, 2012.\n\n[26] D. Venugopal and V. Gogate. On lifting the Gibbs sampling algorithm.\n\n1664\u20131672, 2012.\n\nIn Proc. of NIPS-12, pages\n\n9\n\n\f", "award": [], "sourceid": 1945, "authors": [{"given_name": "Happy", "family_name": "Mittal", "institution": "IIT Delhi"}, {"given_name": "Anuj", "family_name": "Mahajan", "institution": null}, {"given_name": "Vibhav", "family_name": "Gogate", "institution": "UT Dallas"}, {"given_name": "Parag", "family_name": "Singla", "institution": "Indian Institute of Technology"}]}