{"title": "An Integer Polynomial Programming Based Framework for Lifted MAP Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 3302, "page_last": 3310, "abstract": "In this paper, we present a new approach for lifted MAP inference in Markov logic networks (MLNs). The key idea in our approach is to compactly encode the MAP inference problem as an Integer Polynomial Program (IPP) by schematically applying three lifted inference steps to the MLN: lifted decomposition, lifted conditioning, and partial grounding. Our IPP encoding is lifted in the sense that an integer assignment to a variable in the IPP may represent a truth-assignment to multiple indistinguishable ground atoms in the MLN. We show how to solve the IPP by first converting it to an Integer Linear Program (ILP) and then solving the latter using state-of-the-art ILP techniques. Experiments on several benchmark MLNs show that our new algorithm is substantially superior to ground inference and existing methods in terms of computational efficiency and solution quality.", "full_text": "An Integer Polynomial Programming Based\n\nFramework for Lifted MAP Inference\n\nSomdeb Sarkhel, Deepak Venugopal\n\nComputer Science Department\nThe University of Texas at Dallas\n\n{sxs104721,dxv021000}@utdallas.edu\n\nParag Singla\n\nDepartment of CSE\n\nI.I.T. Delhi\n\nparags@cse.iitd.ac.in\n\nVibhav Gogate\n\nComputer Science Department\nThe University of Texas at Dallas\nvgogate@hlt.utdallas.edu\n\nAbstract\n\nIn this paper, we present a new approach for lifted MAP inference in Markov\nlogic networks (MLNs). The key idea in our approach is to compactly encode the\nMAP inference problem as an Integer Polynomial Program (IPP) by schematically\napplying three lifted inference steps to the MLN: lifted decomposition, lifted\nconditioning, and partial grounding. Our IPP encoding is lifted in the sense that\nan integer assignment to a variable in the IPP may represent a truth-assignment\nto multiple indistinguishable ground atoms in the MLN. We show how to solve\nthe IPP by \ufb01rst converting it to an Integer Linear Program (ILP) and then solving\nthe latter using state-of-the-art ILP techniques. Experiments on several benchmark\nMLNs show that our new algorithm is substantially superior to ground inference\nand existing methods in terms of computational ef\ufb01ciency and solution quality.\n\n1\n\nIntroduction\n\nMany domains in AI and machine learning (e.g., NLP, vision, etc.) are characterized by rich relational\nstructure as well as uncertainty. Statistical relational learning (SRL) models [5] combine the power\nof \ufb01rst-order logic with probabilistic graphical models to effectively handle both of these aspects.\nAmong a number of SRL representations that have been proposed to date, Markov logic [4] is\narguably the most popular one because of its simplicity; it compactly represents domain knowledge\nusing a set of weighted \ufb01rst order formulas and thus only minimally modi\ufb01es \ufb01rst-order logic.\nThe key task over Markov logic networks (MLNs) is inference which is the means of answering\nqueries posed over the MLN. Although, one can reduce the problem of inference in MLNs to inference\nin graphical models by propositionalizing or grounding the MLN (which yields a Markov network),\nthis approach is not scalable. The reason is that the resulting Markov network can be quite large,\nhaving millions of variables and features. One approach to achieve scalability is lifted inference,\nwhich operates on groups of indistinguishable random variables rather than on individual variables.\nLifted inference algorithms identify groups of indistinguishable atoms by looking for symmetries\nin the \ufb01rst-order logic representation, grounding the MLN only as necessary. Naturally, when the\nnumber of such groups is small, lifted inference is signi\ufb01cantly better than propositional inference.\nStarting with the work of Poole [17], researchers have invented a number of lifted inference algorithms.\nAt a high level, these algorithms \u201clift\u201d existing probabilistic inference algorithms (cf. [3, 6, 7, 21, 22,\n23, 24]). However, many of these lifted inference algorithms have focused on the task of marginal\ninference, i.e., \ufb01nding the marginal probability of a ground atom given evidence. For many problems\n\n1\n\n\fof interest such as in vision and NLP, one is often interested in the MAP inference task, i.e., \ufb01nding\nthe most likely assignment to all ground atoms given evidence. In recent years, there has been a\ngrowing interest in lifted MAP inference. Notable lifted MAP approaches include exploiting uniform\nassignments for lifted MPE [1], lifted variational inference using graph automorphism [2], lifted\nlikelihood-maximization for MAP [8], exploiting symmetry for MAP inference [15] and ef\ufb01cient\nlifting of MAP LP relaxations using k-locality [13]. However, a key problem with most of the existing\nlifted approaches is that they require signi\ufb01cant modi\ufb01cations to be made to propositional inference\nalgorithms, and for optimal performance require lifting several steps of propositional algorithms. This\nis time consuming because one has to lift decades of advances in propositional inference.\nTo circumvent this problem, recently Sarkhel et al. [18] advocated using the \u201clifting as pre-processing\u201d\nparadigm [20]. The key idea is to apply lifted inference as pre-processing step and construct a Markov\nnetwork that is lifted in the sense that its size can be much smaller than ground Markov network and\na complete assignment to its variables may represent several complete assignments in the ground\nMarkov network. Unfortunately, Sarkhel et al.\u2019s approach does not use existing research on lifted\ninference to the fullest extent and is ef\ufb01cient only when \ufb01rst-order formulas have no shared terms.\nIn this paper, we propose a novel lifted MAP inference approach which is also based on the \u201clifting as\npre-processing\u201d paradigm but unlike Sarkhel et al.\u2019s approach is at least as powerful as probabilistic\ntheorem proving [6], an advanced lifted inference algorithm. Moreover, our new approach can easily\nsubsume Sarkhel et al.\u2019s approach by using it as just another lifted inference rule. The key idea in\nour approach is to reduce the lifted MAP inference (maximization) problem to an equivalent Integer\nPolynomial Program (IPP). Each variable in the IPP potentially refers to an assignment to a large\nnumber of ground atoms in the original MLN. Hence, the size of search space of the generated IPP\ncan be signi\ufb01cantly smaller than the ground Markov network.\nOur algorithm to generate the IPP is based on the following three lifted inference operations which\nincrementally build the polynomial objective function and its associated constraints: (1) Lifted\ndecomposition [6] \ufb01nds sub-problems with identical structure and solves only one of them; (2) Lifted\nconditioning [6] replaces an atom with only one logical variable (singleton atom) by a variable in the\ninteger polynomial program such that each of its values denotes the number of the true ground atoms\nof the singleton atom in a solution; and (3) Partial grounding is used to simplify the MLN further so\nthat one of the above two operations can be applied.\nTo solve the IPP generated from the MLN we convert it to an equivalent zero-one Integer Linear\nProgram (ILP) using a classic conversion method outlined in [25]. A desirable characteristic of\nour reduction is that we can use any off-the-shelf ILP solver to get exact or approximate solution\nto the original problem. We used a parallel ILP solver, Gurobi [9] for this purpose. We evaluated\nour approach on multiple benchmark MLNs and compared with Alchemy [11] and Tuffy [14], two\nstate-of-the-art MLN systems that perform MAP inference by grounding the MLN, as well as with\nthe lifted MAP inference approach of Sarkhel et al. [18]. Experimental results show that our approach\nis superior to Alchemy, Tuffy and Sarkhel et al.\u2019s approach in terms of scalability and accuracy.\n\n2 Notation And Background\n\nPropositional Logic. In propositional logic, sentences or formulas, denoted by f, are composed of\nsymbols called propositions or atoms, denoted by upper case letters (e.g., X, Y , Z, etc.) that are\njoined by \ufb01ve logical operators \u2227 (conjunction), \u2228 (disjunction), \u00ac (negation), \u21d2 (implication) and\n\u21d4 (equivalence). Each atom takes values from the binary domain {true, f alse}.\nFirst-order Logic. An atom in \ufb01rst-order logic (FOL) is a predicate that represents relations between\nobjects. A predicate consists of a predicate symbol, denoted by Monospace fonts (e.g., Friends, R,\netc.), followed by a parenthesized list of arguments. A term is a logical variable, denoted by lower\ncase letters such as x, y, and z, or a constant, denoted by upper case letters such as X, Y , and Z.\nWe assume that each logical variable, say x, is typed and takes values from a \ufb01nite set of constants,\ncalled its domain, denoted by \u2206x. In addition to the logical operators, FOL includes universal \u2200 and\nexistential \u2203 quanti\ufb01ers. Quanti\ufb01ers express properties of an entire collection of objects. A formula in\n\ufb01rst order logic is an atom, or any complex sentence that can be constructed from atoms using logical\noperators and quanti\ufb01ers. For example, the formula \u2200x Smokes(x) \u21d2 Asthma(x) states that all\npersons who smoke have asthma. A Knowledge base (KB) is a set of \ufb01rst-order formulas.\n\n2\n\n\fIn this paper we use a subset of FOL which has no function symbols, equality constraints or existential\nquanti\ufb01ers. We assume that formulas are standardized apart, namely no two formulas share a logical\nvariable. We also assume that domains are \ufb01nite and there is a one-to-one mapping between constants\nand objects in the domain (Herbrand interpretations). We assume that each formula f is of the form\n\u2200xf, where x is the set of variables in f (also denoted by V (f )) and f is a disjunction of literals\n(clause); each literal being an atom or its negation. For brevity, we will drop \u2200 from all formulas.\nA ground atom is an atom containing only constants. A ground formula is a formula obtained by\nsubstituting all of its variables with a constant, namely a formula containing only ground atoms. A\nground KB is a KB containing all possible groundings of all of its formulas.\nMarkov Logic. Markov logic [4] extends FOL by softening hard constraints expressed by formulas\nand is arguably the most popular modeling language for SRL. A soft formula or a weighted formula\nis a pair (f, w) where f is a formula in FOL and w is a real-number. A Markov logic network (MLN),\ndenoted by M, is a set of weighted formulas (fi, wi). Given a set of constants that represent objects in\nthe domain, a Markov logic network represents a Markov network or a log-linear model. The ground\nMarkov network is obtained by grounding the weighted \ufb01rst-order knowledge base with one feature\nfor each grounding of each formula. The weight of the feature is the weight attached to the formula.\nThe ground network represents the probability distribution P (\u03c9) = 1\ni wiN (fi, \u03c9)) where\n\u03c9 is a world, namely a truth-assignment to all ground atoms, N (fi, \u03c9) is the number of groundings\nof fi that evaluate to true given \u03c9 and Z is a normalization constant.\nFor simplicity, we will assume that the MLN is in normal form and has no self joins, namely no two\natoms in a formula have the same predicate symbol [10]. A normal MLN is an MLN that satis\ufb01es the\nfollowing two properties: (i) there are no constants in any formula; and (ii) If two distinct atoms of\npredicate R have variables x and y as the same argument of R, then \u2206x = \u2206y. Because of the second\ncondition, in normal MLNs, we can associate domains with each argument of a predicate. Moreover,\nfor inference purposes, in normal MLNs, we do not have to keep track of the actual elements in\nthe domain of a variable, all we need to know is the size of the domain [10]. Let iR denote the i-th\nargument of predicate R and let D(iR) denote the number of elements in the domain of iR. Henceforth,\nwe will abuse notation and refer to normal MLNs as MLNs.\nMAP Inference in MLNs. A common optimization inference task over MLNs is \ufb01nding the most\nprobable state of the world \u03c9, that is \ufb01nding a complete assignment to all ground atoms which\nmaximizes the probability. Formally,\n\nZ exp ((cid:80)\n\n(cid:33)\n\n(cid:32)(cid:88)\n\ni\n\n(cid:88)\n\ni\n\narg max\n\n\u03c9\n\nPM(\u03c9) = arg max\n\n\u03c9\n\n1\n\nZ(M)\n\nexp\n\nwiN (fi, \u03c9)\n\n= arg max\n\n\u03c9\n\nwiN (fi, \u03c9)\n\n(1)\n\nFrom Eq. (1), we can see that the MAP problem in Markov logic reduces to \ufb01nding the truth assign-\nment that maximizes the sum of weights of satis\ufb01ed clauses. Therefore, any weighted satis\ufb01ability\nsolver can used to solve this problem. The problem is NP-hard in general, but effective solvers exist,\nboth exact and approximate. Examples of such solvers are MaxWalkSAT [19], a local search solver\nand Clone [16], a branch-and-bound solver. Both these algorithms are propositional and therefore\nthey are unable to exploit relational structure that is inherent to MLNs.\nInteger Polynomial Programming (IPP). An IPP problem is de\ufb01ned as follows:\n\nMaximize\nSubject to\n\nf (x1, x2, ..., xn)\n\ngj(x1, x2, ..., xn) \u2265 0\n\n(j = 1, 2, ..., m)\n\nwhere each xi takes \ufb01nite integer values, and the objective function f (x1, x2, ..., xn), and each of\nthe constraints gj(x1, x2, ..., xn) are polynomials on x1, x2, ..., xn. We will compactly represent\nan integer polynomial programming problem (IPP) as an ordered triple I = (cid:104)f, G, X(cid:105), where\nX = {x1, x2, ..., xn}, and G = {g1, g2, ..., gm}.\n\n3 Probabilistic Theorem Proving Based MAP Inference Algorithm\n\nWe motivate our approach by presenting in Algorithm 1, the most basic algorithm for lifted MAP\ninference. Algorithm 1 extends the probabilistic theorem proving (PTP) algorithm of Gogate and\nDomingos [6] to MAP inference and integrates it with Sarkhel et al\u2019s lifted MAP inference rule [18]. It\nis obtained by replacing the summation operator in the conditioning step of PTP by the maximization\noperator (PTP computes the partition function). Note that throughout the paper, we will present\n\n3\n\n\fAlgorithm 1 PTP-MAP(MLN M)\n\nreturn(cid:80)k\n\ni=1 PTP-MAP(Mi)\n\nreturn maxD(1A)\n\ni=0 PTP-MAP(M|(A, i)) + w(A, i)\n\nHeuristically select an argument iR\nreturn PTP-MAP(M|G(iR))\n\nif M has an isolated atom R such that D(iR) > 1 then\n\nreturn PTP-MAP(M|d)\nreturn PTP-MAP (M|{1R})\nif M has a singleton atom A then\n\nif M is empty return 0\nSimplify(M)\nif M has disjoint MLNs M1, . . . , Mk then\nif M has a decomposer d such that D(i \u2208 d) > 1 then\n\nalgorithms that compute the MAP value rather than the MAP assignment; the assignment can be\nrecovered by tracing back the path that yielded the MAP value. We describe the steps in Algorithm 1\nnext, starting with some required de\ufb01nitions.\nTwo arguments iR and jS are called uni\ufb01able\nif they share a logical variable in a MLN for-\nmula. Clearly, uni\ufb01able, if we consider it as\na binary relation U (iR, jS) is symmetric and\nre\ufb02exive. Let U be the transitive closure of\nU. Given an argument iS, let Unify(iS) denote\nthe equivalence class under U.\nSimpli\ufb01cation. In the simpli\ufb01cation step, we\nsimplify the predicates possibly reducing their\narity (cf. [6, 10] for details). An example sim-\npli\ufb01cation step is the following: if no atoms of\na predicate share logical variables with other\natoms in the MLN then we can replace the\npredicate by a new predicate having just one\nargument; the domain size of the argument is\nthe product of domain sizes of the individual arguments.\nExample 1. Consider a normal MLN with two weighted formulas: R(x1, y1) \u2228 S(z1, u1), w1 and\nR(x2, y2) \u2228 S(z2, u2) \u2228 T(z2, v2), w2. We can simplify this MLN by replacing R by a predicate\nJ having one argument such that D(1J) = D(1R) \u00d7 D(2R). The new MLN has two formulas:\nJ(x1) \u2228 S(z1, u1), w1 and J(x2) \u2228 S(z2, u2) \u2228 T(z2, v2), w2.\nDecomposition. If an MLN can be decomposed into two or more disjoint MLNs sharing no \ufb01rst-order\natom, then the MAP solution is just a sum over the MAP solutions of all the disjoint MLNs.\nLifted Decomposition. Main idea in lifted decomposition [6] is to identify identical but disconnected\ncomponents in ground Markov network by looking for symmetries in the \ufb01rst-order representation.\nSince the disconnected components are identical, only one of them needs to be solved and the MAP\nvalue is the MAP value of one of the components times the number of components. One way of\nidentifying identical disconnected components is by using a decomposer [6, 10], de\ufb01ned below.\nDe\ufb01nition 1. [Decomposer] Given a MLN M having m formulas denoted by f1, . . . , fm, d =\nUnify(iR) where R is a predicate in M, is called a decomposer iff the following conditions are\nsatis\ufb01ed: (i) for each predicate R in M there is exactly one argument iR such that iR \u2208 d; and (ii) in\neach formula fi, there exists a variable x such that x appears in all atoms of fi and for each atom\nhaving predicate symbol R in fi, x appears at position iR \u2208 d.\nDenoted by M|d the MLN obtained from M by setting domain size of all elements iR of d to one\nand updating weight of each formula that mentions R by multiplying it by D(iR). We can prove that:\nProposition 1. Given a decomposer d, the MAP value of M is equal to the MAP value of M|d.\nExample 2. Consider a normal MLN M having two weighted formulas R(x) \u2228 S(x), w1 and R(y) \u2228\nT(y), w2 where D(1R) = D(1S) = D(1T) = n. Here, d = {1R, 1S, 1T} is a decomposer. The\nMLN M|d is the MLN having the same two formulas as M with weights updated to nw1 and nw2\nrespectively. Moreover, in the new MLN D(1R) = D(1S) = D(1T) = 1.\n\nIsolated Singleton Rule. Sarkhel et al. [18] proved that if the MLN M has an isolated predicate R\nsuch that all atoms of R do not share any logical variables with other atoms, then one of the MAP\nsolutions of M has either all ground atoms of R set to true or all of them set to f alse, namely, the\nsolution lies at the extreme assignments to groundings of R. Since we simplify the MLN, all such\npredicates R have only one argument, namely, they are singleton. Therefore, the following proposition\nis immediate:\nProposition 2. If M has an isolated singleton predicate R, then the MAP value of M equals the\nMAP value of M|{1R} (the notation M|{1R} is de\ufb01ned just after the de\ufb01nition of the decomposer).\nLifted Conditioning over Singletons. Performing a conditioning operation on a predicate means\nconditioning on all possible ground atoms of that predicate. Na\u00a8\u0131vely it can result in exponential\n\n4\n\n\fnumber of alternate MLNs that need to be solved, one for each assignment to all groundings of the\npredicate. However if the predicate is singleton, we can group these assignments into equi-probable\nsets based on number of true groundings of the predicate (counting assignment) [6, 10, 12]. In\nthis case, we say that the lifted conditioning operator is applicable. For a singleton A, we denote\nthe counting assignment as the ordered pair (A, i) which the reader should interpret as exactly i\ngroundings of A are true and the remaining are f alse.\nWe denote by M|(A, i) the MLN obtained from M as follows. For each element jR in Unify(1A)\n(in some order), we split the predicate R into two predicates R1 and R2 such that D(jR1) = i and\nD(jR2) = D(1A) \u2212 i. We then rewrite all formulas using these new predicate symbols. Assume that\nA is split into two predicates A1 and A2 respectively with D(1A1 ) = i and D(1A2) = D(1A) \u2212 i. Then,\nwe delete all formulas in which either A1 appears positively or A2 appears negatively (because they\nare satis\ufb01ed). Next, we delete all literals of A1 and A2 from all formulas in the MLN. The weights of\nall formulas (which are not deleted) remain unchanged except those formulas in which atoms of A1\nor A2 do not share logical variables with other atoms. The weight of each such formula f with weight\nw is changed to w \u00d7 D(1A1) if A1 appears in the clause or to w \u00d7 D(1A2 ) if A2 appears in the clause.\nThe weight w(A, i) is calculated as follows. Let F (A1) and F (A2) denote the set of satis\ufb01ed formulas\n(which are deleted) in which A1 and A2 participate in. We introduce some additional notation. Let\nV (f ) denote the set of logical variables in a formula f. Given a formula f, for each variable y \u2208 V (f ),\nlet iR(y) denote the position of the argument of a predicate R such that y appears at that position in an\natom of R in f. Then, w(A, i) is given by:\n\n2(cid:88)\n\n(cid:88)\n\n(cid:89)\n\nw(A, i) =\n\nwj\n\nD(iR(y))\n\nk=1\n\nfj\u2208F (Ak)\n\ny\u2208V (fj )\n\ni=0 MAP-value(M|(A, i)) + w(A, i).\n\nWe can show that:\nProposition 3. Given an MLN M having singleton atom A, the MAP-value of M equals\nmaxD(1A)\nExample 3. Consider a normal MLN M having two weighted formulas R(x) \u2228 S(x), w1 and R(y) \u2228\nS(z), w2 with domain sizes D(1R) = D(1S) = n. The MLN M|(R, i) is the MLN having three\nweighted formulas: S2(x2), w1; S1(x1), w2(n\u2212 i) and S2(x3), w2(n\u2212 i) with domains D(1S1) = i\nand D(1S2 ) = n \u2212 i. The weight w(R, i) = iw1 + niw2.\nPartial grounding. In the absence of a decomposer, or when the singleton rule is not applicable, we\nwill have to partially ground a predicate. For this, we heuristically select an argument iR to ground.\nLet M|G(iR) denote the MLN obtained from M as follows. For each argument iS \u2208 Unify(iR), we\ncreate D(iS) new predicates which have all arguments of S except iS. We then update all formulas\nwith the new predicates. For example,\nExample 4. Consider a MLN with two formulas: R(x, y) \u2228 S(y, z), w1 and S(a, b) \u2228 T(a, c), w2.\nLet D(2R) = 2. After grounding 2R, we get an MLN having four formulas: R1(x1) \u2228 S1(z1), w1,\nR2(x2) \u2228 S2(z2), w1, S1(b1) \u2228 T1(c1), w2 and S2(b2) \u2228 T2(c2), w2.\nSince partial grounding will create many new clauses, we will try to use this operator as sparingly as\npossible. The following theorem is immediate from [6, 18] and the discussion above.\nTheorem 1. PTP-MAP(M ) computes the MAP value of M.\n\n4\n\nInteger Polynomial Programming formulation for Lifted MAP\n\nPTP-MAP performs an exhaustive search over all possible lifted assignments in order to \ufb01nd the\noptimal MAP value. It can be very slow without proper pruning, and that is why branch-and-bound\nalgorithms are widely used for many similar optimization tasks. The branch-and-bound algorithm\nmaintains a global best solution found so far, as a lower bound. If the estimated upper bound of a node\nis not better than the lower bound, the node is pruned and the search continues with other branches.\nHowever instead of developing a lifted MAP speci\ufb01c upper bound heuristic to improve Algorithm 1,\nwe propose to encode the lifted search problem as an Integer Polynomial Programming (IPP) problem.\nThis way we can use existing off-the-shelf advanced machinery, which includes pruning techniques,\nsearch heuristics, caching, problem decomposition and upper bounding techniques, to solve the IPP.\n\n5\n\n\fAlgorithm 2 SMLN-2-IPP(SMLN S)\n\nAt a high level, our encoding algorithm runs PTP-MAP schematically, performing all steps in PTP-\nMAP except the search or conditioning step. Before we present our algorithm, we de\ufb01ne schematic\nMLNs (SMLNs) \u2013 a basic structure on which our algorithm operates. SMLNs are normal MLNs\nwith two differences: (1) weights attached to formulas are polynomials instead of constants and (2)\nDomain sizes of arguments are linear expressions instead of constants.\nAlgorithm 2 presents our approach to encode lifted\nMAP problem as an IPP problem. It mirrors Algo-\nrithm 1, with only difference being at the lifted condi-\ntioning step. Speci\ufb01cally, in lifted conditioning step,\ninstead of going over all possible branches corre-\nsponding to all possible counting assignments, the\nalgorithm uses a representative branch which has a\nvariable associated for the corresponding counting\nassignment. All update steps described in the previ-\nous section remain unchanged with the caveat that in\nS|(A, i), i is symbolic(an integer variable). At termi-\nnation, Algorithm 2 yields an IPP. Following theorem\nis immediate from the correctness of Algorithm 1.\nTheorem 2. Given an MLN M and its associated\nschematic MLN S, the optimum solution to the Inte-\nger Polynomial Programming problem returned by\nSMLN-2-IPP(S) is the MAP solution of M.\n\nif S has a decomposer d then\nreturn SMLN-2-IPP(S|d)\nreturn SMLN-2-IPP(S|{iR})\nif S has a singleton atom A then\nIntroduce an IPP variable \u2018i\u2019\nForm a constraint g as \u2018(0 \u2264 i \u2264 D(1A))\u2019\n(cid:104)f, G, X(cid:105) = SMLN-2-IPP(S|(A, i))\nreturn (cid:104)f + w(A, i), G \u222a {g}, X \u222a {i}(cid:105)\n\nif S is empty return (cid:104)0,\u2205,\u2205(cid:105)\nSimplify(S)\nif S has disjoint SMLNs then\n\nif S has a isolated singleton R then\n\nreturn (cid:104)(cid:80)k\n\nfor disjoint SMLNs Si...Sk in S\ni=1Gi,\u222ak\n\n(cid:104)fi, Gi, Xi(cid:105) = SMLN-2-IPP(Si)\ni=1Xi(cid:105)\n\ni=1 fi,\u222ak\n\nHeuristically select an argument iR\nreturn SMLN-2-IPP(S|G(iR))\n\nIn the next three examples, we show the IPP output\nby Algorithm 2 on some example MLNs.\nExample 5. Consider an MLN having one weighted\nformula: R(x) \u2228 S(x), w1 such that D(1R) = D(1S) = n. Here, d = {1R, 1S} is a decomposer. By\napplying the decomposer rule, weight of the formula becomes nw1 and domain size is set to 1. After\nconditioning on R objective function obtained is nw1r and the formula changes to S(x), nw1(1 \u2212 r).\nAfter conditioning on S, the IPP obtained has objective function nw1r + nw1(1 \u2212 r)s and two\nconstraints: 0 \u2264 r \u2264 1 and 0 \u2264 s \u2264 1.\nExample 6. Consider an MLN having one weighted formula: R(x)\u2228 S(y), w1 such that D(1R) = nx\nand D(1S) = ny. Here R and S are isolated, and therefore by applying the isolated singleton rule\nweight of the formula becomes nxnyw1. This is similar to the previous example; only weight of the\nformula is different. Therefore, substituting this new weight, IPP output by Algorithm 2 will have\nobjective function nxnyw1r + nxnyw1(1 \u2212 r)s and two constraints 0 \u2264 r \u2264 1 and 0 \u2264 s \u2264 1.\nExample 7. Consider an MLN having two weighted formulas: R(x) \u2228 S(x), w1 and R(z) \u2228 S(y), w2\nsuch that D(1R) = D(1S) = n. On this MLN, the IPP output by Algorithm 2 has the objective\nfunction rw1 + r2w2 + rw2(n\u2212 r) + s2w1(n\u2212 r) + s2w2(n\u2212 r)2 + s1w2(n\u2212 r)r and constraints\n0 \u2264 r \u2264 n, 0 \u2264 s1 \u2264 1 and 0 \u2264 s2 \u2264 1. The operations that will be applied in order are: lifted\nconditioning on R creating two new predicates S1 and S2; decomposer on 1S1; decomposer on 1S2;\nand then lifted conditioning on S1 and S2 respectively.\n\n4.1 Solving Integer Polynomial Programming Problem\n\nAlthough we can directly solve the IPP using any off-the-shelf mathematical optimization software,\nIPP solvers are not as mature as Integer Linear programming(ILP) solvers. Therefore, for ef\ufb01ciency\nreasons, we propose to convert the IPP to an ILP using the classic method outlined in [25] (we skip the\ndetails for lack of space). The method \ufb01rst converts the IPP to a zero-one Polynomial Programming\nproblem and then subsequently linearizes it by adding additional variables and constraints for each\nhigher degree terms. Once the problem is converted to an ILP problem we can use any standard ILP\nsolver to solve it. Next, we state a key property about this conversion in the following theorem.\nTheorem 3. The search space for solving the IPP obtained from Algorithm 2 by using the conversion\ndescribed in [25] is polynomial in the max-range of the variables.\n\nProof. Let n be number of variables of the IPP problem, where each of the variables has range from\n0 to (d \u2212 1) (i.e., for each variable 0 \u2264 vi \u2264 d \u2212 1). As we \ufb01rst convert everything to binary, the\n\n6\n\n\fzero-one Polynomial Programming problem will have O(n log2 d) variables. If the highest degree of\na term in the IPP problem is k, we will need to introduce O(log2 dk) binary variables (as multiplying\nk variables, each bounded by d, will result in terms bounded by dk) to linearize it. Since search space\nof an ILP is exponential in number of variables, search space for solving the IPP problem is:\n\nO(2(n log2 d+log2 dk)) = O(2n log2 d)O(2k log2 d) = O(dn)O(dk) = O(dn+k)\n\nWe conclude this section by summarizing the power of our new approach:\nTheorem 4. The search space of the IPP returned by Algorithm 2 is smaller than or equal to the\nsearch space of the Integer Linear Program (ILP) obtained using the algorithm proposed in Sarkhel\net al. [18], which in turn is smaller than the size of the search space associated with the ground\nMarkov network.\n\n5 Experiments\n\nWe used a parallelized ILP solver called Gurobi [9] to solve ILPs generated by our algorithm as\nwell as by other competing algorithms used in our experimental study. We compared performance of\nour new lifted algorithm (which we call IPP) with four other algorithms from literature: Alchemy\n(ALY) [11], Tuffy(TUFFY) [14], ground inference based on ILP (ILP), and lifted MAP (LMAP)\nalgorithm of Sarkhel et al. [18]. Alchemy and Tuffy are two state-of-the-art open source software for\nlearning and inference in MLNs. Both of them \ufb01rst ground the MLN and then use an approximate\nsolver, MaxWalkSAT [19] to compute MAP solution. Unlike Alchemy, Tuffy uses clever Database\ntricks to speed up computation. ILP is obtained by converting MAP problem over ground Markov\nnetwork to an ILP. LMAP also converts the MAP problem to ILP, however its ILP encoding can be\nmuch more compact than ones used by ground inference methods because it processes \u201cnon-shared\natoms\u201d in a lifted manner (see [18] for details). We used following three MLNs to evaluate our\nalgorithm:\n\n(i) An MLN which we call Student that consists of following four formulas,\n\nTeaches(teacher,course) \u2227 Takes(student,course) \u2192 JobOffers(student,company);\nTeaches(teacher,course); Takes(student,course); \u00acJobOffers(student,company)\n\n(ii) An MLN which we call Relationship that consists of following four formulas,\n\nLoves(person1 ,person2) \u2227 Friends(person2, person3) \u2192 Hates(person1, person3);\nLoves(person1, person2); Friends(person1, person2); \u00acHates(person1, person2);\n\n(iii) Citation Information-Extraction (IE) MLN [11] from the Alchemy web page, consisting of\n\n\ufb01ve predicates and fourteen formulas.\n\nTo compare performance and scalability, we ran each algorithm on aforementioned MLNs for varying\ntime-bounds and recorded solution quality (i.e., the total weight of false clauses) achieved by each.\nAll our experiments were run on a third generation i7 quad-core machine having 8GB RAM.\nFor Student MLNs, results are shown in Fig 1(a)-(c). On the MLN having 161K clauses, ILP, LMAP\nand IPP converge quickly to the optimal answer while TUFFY converges faster than ALY. For the\nMLN with 812K clauses, LMAP and IPP converge faster than ILP and TUFFY. ALY is unable to\nhandle this large Markov network and runs out of memory. For the MLN with 8.1B clauses, only\nLMAP and IPP are able to produce a solution with IPP converging much faster than LMAP. On this\nlarge MLN, all three ground inference algorithms, ILP, ALY and TUFFY ran out of memory.\nResults for Relationship MLNs are shown in Fig 1(d)-(f) and are similar to Student MLNs. On MLNs\nwith 9.2K and 29.7K clauses ILP, LMAP and IPP converge faster than TUFFY and ALY, while\nTUFFY converges faster than ALY. On the largest MLN having 1M clauses only LMAP, ILP and IPP\nare able to produce a solution with IPP converging much faster than other two.\nFor IE MLN results are shown in Fig 1(g)-(i) which show a similar picture with IPP outperforming\nother algorithms as we increase number of objects in the domain. In fact on the largest IE MLN\nhaving 15.6B clauses only IPP is able to output a solution while other approaches ran out of memory.\nIn summary, as expected, IPP and LMAP, two lifted approaches are more accurate and scalable than\nthree propositional inference approaches: ILP, TUFFY and ALY. IPP not only scales much better but\nalso converges much faster than LMAP, clearly demonstrating the power of our new approach.\n\n7\n\n\f(a) Student(1.2K,161K,200)\n\n(b) Student(2.7K,812K,450)\n\n(c) Student(270K,8.1B,45K)\n\n(d) Relation(1.2K,9.2K,200)\n\n(e) Relation(2.7K,29.7K,450)\n\n(f) Relation(30K,1M,5K)\n\n(g) IE(3.2K,1M,100)\n\n(h) IE(82.8K,731.6M,900)\n\n(i) IE(380K,15.6B,2.5K)\n\nFigure 1: Cost vs Time: Cost of unsatis\ufb01ed clauses(smaller is better) vs time for different domain sizes.\nNotation used to label each \ufb01gure: MLN(numvariables, numclauses, numevidences). Note: three quantities\nreported are for ground Markov network associated with the MLN. Standard deviation is plotted as error bars.\n\n6 Conclusion\n\nIn this paper we presented a general approach for lifted MAP inference in Markov logic networks\n(MLNs). The main idea in our approach is to encode MAP problem as an Integer Polynomial Program\n(IPP) by schematically applying three lifted inference steps to the MLN: lifted decomposition, lifted\nconditioning and partial grounding. To solve the IPP, we propose to convert it to an Integer Linear\nProgram (ILP) using the classic method outlined in [25]. The virtue of our approach is that the\nresulting ILP can be much smaller than the one obtained from ground Markov network. Moreover,\nour approach subsumes the recently proposed lifted MAP inference approach of Sarkhel et al. [18]\nand is at least as powerful as probabilistic theorem proving [6]. Perhaps, the key advantage of our\napproach is that it runs lifted inference as a pre-processing step, reducing the size of the theory and\nthen applies advanced propositional inference algorithms to this theory without any modi\ufb01cations.\nThus, we do not have to explicitly lift (and ef\ufb01ciently implement) decades worth of research and\nadvances on propositional inference algorithms, treating them as a black-box.\n\nAcknowledgments\n\nThis work was supported in part by the AFRL under contract number FA8750-14-C-0021, by the\nARO MURI grant W911NF-08-1-0242, and by the DARPA Probabilistic Programming for Advanced\nMachine Learning Program under AFRL prime contract number FA8750-14-C-0005. Any opinions,\n\ufb01ndings, conclusions, or recommendations expressed in this paper are those of the authors and do not\nnecessarily re\ufb02ect the views or of\ufb01cial policies, either expressed or implied, of DARPA, AFRL, ARO\nor the US government.\n\n8\n\n 100 1000 10000 100000 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsALY TUFFY IPP ILP LMAP 1000 10000 100000 1e+06 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsTUFFY IPP ILP LMAP 100000 1e+06 1e+07 1e+08 1e+09 1e+10 1e+11 1e+12 1e+13 1e+14 1e+15 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsIPP LMAP 100 1000 10000 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsALY TUFFY IPP ILP LMAP 100 1000 10000 100000 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsTUFFY IPP ILP LMAP 10000 100000 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsIPP ILP LMAP 1000 10000 100000 1e+06 1e+07 1e+08 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsALY TUFFY IPP ILP LMAP 100000 1e+06 1e+07 1e+08 1e+09 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsIPP LMAP 100000 1e+06 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsIPP \fReferences\n[1] Udi Apsel and Ronen I. Braman. Exploiting uniform assignments in \ufb01rst-order MPE. In AAAI, pages\n\n74\u201383, 2012.\n\n[2] H. Bui, T. Huynh, and S. Riedel. Automorphism groups of graphical models and lifted variational inference.\n\nIn UAI, 2013.\n\n[3] R. de Salvo Braz. Lifted First-Order Probabilistic Inference. PhD thesis, University of Illinois, Urbana-\n\nChampaign, IL, 2007.\n\n[4] P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Arti\ufb01cial Intelligence. Morgan &\n\nClaypool, San Rafael, CA, 2009.\n\n[5] L. Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning. MIT Press, 2007.\n[6] V. Gogate and P. Domingos. Probabilistic Theorem Proving. In UAI, pages 256\u2013265. AUAI Press, 2011.\n[7] V. Gogate, A. Jha, and D. Venugopal. Advances in Lifted Importance Sampling. In AAAI, 2012.\n[8] Fabian Hadiji and Kristian Kersting. Reduce and re-lift: Bootstrapped lifted likelihood maximization for\n\nMAP. In AAAI, pages 394\u2013400, Seattle, WA, 2013. AAAI Press.\n\n[9] Gurobi Optimization Inc. Gurobi Optimizer Reference Manual, 2014.\n[10] A. Jha, V. Gogate, A. Meliou, and D. Suciu. Lifted Inference from the Other Side: The tractable Features.\n\nIn NIPS, pages 973\u2013981, 2010.\n\n[11] S. Kok, M. Sumner, M. Richardson, P. Singla, H. Poon, D. Lowd, J. Wang, and P. Domingos. The Alchemy\nSystem for Statistical Relational AI. Technical report, Department of Computer Science and Engineering,\nUniversity of Washington, Seattle, WA, 2008. http://alchemy.cs.washington.edu.\n\n[12] B. Milch, L. S. Zettlemoyer, K. Kersting, M. Haimes, and L. P. Kaelbling. Lifted Probabilistic Inference\n\nwith Counting Formulas. In AAAI, pages 1062\u20131068, 2008.\n\n[13] Martin Mladenov, Amir Globerson, and Kristian Kersting. Ef\ufb01cient Lifting of MAP LP Relaxations Using\n\nk-Locality. AISTATS 2014, 2014.\n\n[14] Niu, Feng and R\u00b4e, Christopher and Doan, AnHai and Shavlik, Jude. Tuffy: Scaling up statistical inference\nin markov logic networks using an RDBMS. Proceedings of the VLDB Endowment, 4(6):373\u2013384, 2011.\n[15] Jan Noessner, Mathias Niepert, and Heiner Stuckenschmidt. RockIt:exploiting parallelism and symmetry\n\nfor MAP inference in statistical relational models. In AAAI, Seattle,WA, 2013.\n\n[16] K.; Pipatsrisawat and A.. Darwiche. Clone: Solving Weighted Max-SAT in a Reduced Search Space. In AI,\n\npages 223\u2013233, 2007.\n\n[17] D. Poole. First-Order Probabilistic Inference. In IJCAI 2003, pages 985\u2013991, Acapulco, Mexico, 2003.\n\nMorgan Kaufmann.\n\n[18] Somdeb Sarkhel, Deepak Venugopal, Parag Singla, and Vibhav Gogate. Lifted MAP inference for Markov\n\nLogic Networks. AISTATS 2014, 2014.\n\n[19] B. Selman, H. Kautz, and B. Cohen. Local Search Strategies for Satis\ufb01ability Testing. In Cliques, Coloring,\nand Satis\ufb01ability: Second DIMACS Implementation Challenge, pages 521\u2013532. American Mathematical\nSociety, 1996.\n\n[20] J. W. Shavlik and S. Natarajan. Speeding up inference in markov logic networks by preprocessing to\n\nreduce the size of the resulting grounded network. In IJCAI, pages 1951\u20131956, 2009.\n\n[21] P. Singla and P. Domingos. Lifted First-Order Belief Propagation. In AAAI, pages 1094\u20131099, Chicago,\n\nIL, 2008. AAAI Press.\n\n[22] G. Van den Broeck, A. Choi, and A. Darwiche. Lifted relax, compensate and then recover: From\n\napproximate to exact lifted probabilistic inference. In UAI, pages 131\u2013141, 2012.\n\n[23] G. Van den Broeck, N. Taghipour, W. Meert, J. Davis, and L. De Raedt. Lifted Probabilistic Inference by\n\nFirst-Order Knowledge Compilation. In IJCAI, pages 2178\u20132185, 2011.\n\n[24] D. Venugopal and V. Gogate. On Lifting the Gibbs Sampling Algorithm. In NIPS, pages 1655\u20131663, 2012.\n[25] Lawrence J Watters. Reduction of Integer Polynomial Programming Problems to Zero-One Linear\n\nProgramming Problems. Operations Research, 15(6):1171\u20131174, 1967.\n\n9\n\n\f", "award": [], "sourceid": 1694, "authors": [{"given_name": "Somdeb", "family_name": "Sarkhel", "institution": "University of Texas at Dallas"}, {"given_name": "Deepak", "family_name": "Venugopal", "institution": "The University of Texas at Dallas (UT Dallas)"}, {"given_name": "Parag", "family_name": "Singla", "institution": "IIT Delhi"}, {"given_name": "Vibhav", "family_name": "Gogate", "institution": "UT Dallas"}]}