Paper ID: 1306
Title: On the Complexity and Approximation of Binary Evidence in Lifted Inference
Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Every so often, you read a paper that makes you think, "Wow, I wish I'd thought of that!" This is one such paper.

This paper presents a method to do efficient lifted inference, even when there is evidence on binary predicates. This is a very significant contribution because lifted inference algorithms tend to scale very poorly with the amount of evidence available. It was recently shown that evidence on unary predicates can be efficiently incorporated into exact lifted inference algorithms, but that evidence on binary predicates, in general, cannot. This paper describes the special cases when such evidence can be incorporated efficiently: when the predicate's evidence matrix has low Boolean rank, it can be transformed into a set of unary predicates that can be efficiently incorporated as evience. When the evidence is not low-rank (which is the more common case), it may still be worth approximating it as a low-rank matrix in order to reap the benefits of lifted inference. This paper also shows how to do this and presents some basic experimental results in one domain.

This paper is very clearly written and contains a very significant theoretical concept that is key to lifted inference. At its core, the idea is very simple, but I predict that it will have a large impact in the lifted inference community. Most existing lifted inference algorithms are not much better than ground inference algorithms when there is a lot of evidence. This paper brings us one step closer to making lifted inference much more practical.

As far as weaknesses, the experimental results suggest that although the low-rank approximation leads to faster lifted inference, it also hurts accuracy. After enough iterations, ground MCMC is usually more accurate than lifted MCMC, even when the approximation has a relatively large rank (such as 75 or 150). Exact inference is constrained to much smaller rank approximations (< 10). It would be very interesting to see more experimental results on other datasets and using other algorithms, such as lifted BP, and it would be nice to see more realistic cases where the low-rank approximation leads to much better results.

Even so, the experiments do a good job of clearly demonstrating that the low-rank approximation leads to faster convergence of lifted MCMC due to the additional symmetries, as well as exploring the tradeoffs. Thus, I do not consider these weaknesses to be critical -- expanding the experiments can certainly be left for future work or a journal version of this paper.
Q2: Please summarize your review in 1-2 sentences
This paper contains a very significant theoretical result about the tractability of lifted inference that is likely to be high-impact. The experiments are preliminary, but do a good job of showing the tradeoffs of applying the proposed approximation on an interesting dataset.

Submitted by Assigned_Reviewer_4

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes an approach to performing lifted inference in models for statistical relational learning when evidence is binary, i.e., relations between entities, and complex, i.e., symmetry breaking. Leveraging symmetry has been crucial to the success of previous work on lifted inference, but, the authors point out, many real-world problems do not have a great deal of symmetry to leverage. The authors address this problem by first introducing a theoretical description of the complexity of binary evidence, called Boolean rank, and relate that to the complexity of performing inference using that evidence. They then introduce a scheme for approximating evidence of high Boolean rank with similar evidence of a lower Boolean rank and using this approximate evidence for lifted inference. They evaluate their approach on WebKB data and compare lifted and non-lifted inference using the original evidence and evidence approximations of varying Boolean rank. By approximating evidence, they are able to improve speed without sacrificing a great deal of predictive accuracy.

This is a very well written paper. The exposition is clear, the problem investigated is an important one, and the introduced theory and strategy are original and compelling solutions.

A weakness of the paper is that only WebKB is considered. It would be nice to see if the paper's approach could scale Markov logic networks to bigger problems than WebKB.

One style note: NIPS style is to number the introduction as section 1.
Q2: Please summarize your review in 1-2 sentences
This paper introduces new theory and strategy for increasing the applicability of lifted inference by approximating evidence with modified evidence more conducive to lifted inference. It is a new approach to an important problem.

Submitted by Assigned_Reviewer_6

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I should be upfront that though I am quite familiar with graphical models and Markov Random Fields, I'm far from an expert on Markov Logic Networks (MLNs). So I approached this paper through the lens of a generic graphical models person, and inevitably I tended to interpret the results in that way.

This paper is interested in inference (computing marginal probabilities and/or a normalization factor) in a MLN. Inference is "conditional" in the sense that one will first observe some "evidence", which gives a subset of allowed values to each variable or pair of variables. Evidence is "unary" if it constrains single variables only, and "binary" if it also constraints pairs. The main idea is the following: Though inference in large MLNs is generally hard, some existing "lifted" algorithms can perform inference efficiently in large models by exploiting special structure in special classes of MLNs. However, conditioning on binary evidence destroys this special structure.

This paper basically points out that, rather than conditioning on binary evidence, one could augment the graph with extra variables, and then condition on only on unary evidence. However, in the worst case, the needed number of new variables is exponential in the number of values each variable can take. However, if the binary evidence for a given pair happens to be (binary) low-rank, then a much smaller number of variables are needed. Thus, inference in general can be approached by (possibly approximately) finding a low-rank decomposition of each binary evidence term, adding extra variables, and then using the specialized unary inference algorithm.

The major technical result is Theorem 2 which states that if you have a "lifted" inference algorithm and all pairwise evidence admits a low-rank decomposition, than exact inference can be done in polynomial time. This can naturally (i.e. heuristically) be done approximately through approximate decomposition.

A couple other issues:

- Is Boolean rank essential? Could one alternatively state that inference is always exponential in the number of values that each variable can have? (Which is weaker than Boolean rank, but simpler to understand.) The experiments use a very large number of variables, which motivates the decomposition, but presumably many applications have small numbers, in which case decomposition is superfluous. Is this result still useful?

- Theorem 1 is strangely stated. I think this is previous work, in which case a reference needs to be provided. I also think that one should say "no polynomial time algorithm exists" rather than "takes exponential time".

- Finally, for what its worth, as someone quite familiar with graphical models and MRFs, but not so familiar with Markov logic networks, I found this paper quite challenging to understand. The major issue is that very many technical terms ("evidence", "term", "literal", "relation", "inference", etc.) are used undefined, and there is no clear reference provided where these terms are precisely defined. After reading quite a few of the background papers I think I understand these, but the literature on MLNs doesn't seem totally consistent in usage, which makes things a challenge. (For example, typically "evidence" is a set of fixed values for a set of variables, whereas this paper uses it as a set of constraints on variables or pairs of variables.) At a minimum, the paper should state what is meant by inference, summarize what classes of symmetries previous work has taken advantage of to make inference tractable, and be precise about what evidence is and how it changes inference. It could be than in an eight page paper there is no way to make the current paper easily understandable by readers that aren't MLN experts, but I do think more effort could be made in this direction.


Edit after author response:

In my initial review, I had a concern about how conditioning could affect the efficiency of inference. After reading the other reviews, the author response, and discussing with the other reviewers, I'm convinced this isn't an issue, and am correspondingly raising my rating.
Q2: Please summarize your review in 1-2 sentences
The idea seems OK, though a moderate increment on existing results. I would lean towards being more positive if I understood why the main result is true, which I hope can be addressed in discussions.

Submitted by Assigned_Reviewer_7

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper explores how Boolean rank can be be used to characterize the efficiency of lifted inference for binary evidence (but, interestingly not beyond binary).

Unfortunately there are some very misleading statements in the paper which would lead the reader to make some erroneous conclusions. For example "They perform inference that is polynomial in the number of objects in the model [5] and are therein exponentially faster than classical inference algorithms." is only true for very restricted conditions (what the authors later call "domain-liftable"). [5] only proves it for formulae that only contain up-to two logical variables. Moreover it isn't true in general. You need to be much more precise in your claims.

You need to give a reference for theorem 1. It isn't new to this paper.
Q2: Please summarize your review in 1-2 sentences
This is a good paper that explores how, even though how lifted inference with observations of more than 2 arguments is #P-hard, it can be efficient if the observations have a low Boolean rank. This observation leads to an approximation scheme to find an approximation with low Boolean rank.
Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank the reviewers for their kind remarks.

In response to Reviewer 4, on the use of WebKB, we want to point out that WebKB with binary evidence is a far harder problem than what is typically considered in lifted inference papers (even most approximate ones). It has >1 million random variables and factors, and 333 MLN formulas.
We see the use of WebKB as a strength of the paper.

We do agree with Reviewers 1 and 4 that more experiments, with more data sets and algorithms, would be very interesting. Given the space constraints, we believe these should be left for future work. We also hope that other lifted inference authors will take the general ideas presented here and apply them to their specific algorithms.

In response to Reviewer 6, we understand that the paper can be challenging for people unfamiliar with statistical relational learning. We want to address this based on your feedback, in so far that space allows us.


We would like to clarify the following for Reviewer 6 (based on the initial review):

- There seems to be a mix-up between random variables (ground atoms) and logical variables in your review.
First-order atoms with constraints on the logical variables represent sets of random variables. Similarly, first-order formulas represent sets of factors. These are used by lifted inference algorithms to reason about groups of random variables or factors efficiently, without grounding.
Evidence is a truth-value assignment to a set of random variables (as in Bayesian networks), which can equivalently be represented by a truth-value assignment to first-order atoms with constraints on the logical variables.

- Viewing our work as "augmenting the graph with additional random variables and conditioning on them" is a bad premise to understand the contributions of the paper.
Lifted inference is all about the symmetries expressed by a first-order formulation of the model. A key insight of lifted inference is that the complexity of inference can be exponential in the size of this first-order formulation, but only polynomial in the domain size, that is, the number of objects in the world. When looking only at the induced probabilistic graphical model (PGM), these two parameters blend into one, namely the size of the PGM, and those insights are lost.
This paper is all about distinguishing two parameters for evidence: the size of the evidence (~domain size) and its Boolean rank (~size of first-order representation). Our key insight is that our encoding increases the size of the first-order representation with the Boolean rank, and not with the size of the evidence. Hence the complexity of inference will be polynomial if the Boolean rank is bounded.
As you can see, distinguishing these two parameters is essential in understanding our work, and the difference is lost when looking at the PGM.

- About the suggestion to "state that inference is exponential in the number of values variables can have": this is not the case for lifted inference (assuming you mean logical variables). In fact, the definition of domain-lifted inference states exactly the opposite [4].

- About the comment that our experiments use a very large number of variables, and many applications have a small number: this is not the case in statistical relational learning. MLNs are typically small, but they are applied to very large graphs, such as social networks, or web pages, which lead to very large domain sizes, and a very large number of random variables.

- About the changed structure of \Delta' being a problem: the structure of Formula 2 has no influence on the 'liftability' of the model \Delta'. It beaks no symmetries, and it is easily handled by modern lifted inference algorithms. For example, lifted inference algorithms FOVE, PTP and WFOMC can all eliminate unary atoms, reducing Formula 2 to a unit clause, which can be propagated or absorbed. In short, it poses no problem. We will say that more explicitly.

- The mapping from binary to unary evidence does not require an exponential number of new variables (random or logical). The number of additional unary predicates, first-order atoms, and logical variables added to the model is all linear in the Boolean rank and independent of the domain or evidence size. The number of additional random variables is polynomial in the domain or evidence size.
The exponential blowup comes from the fact that Formula 2 has size linear in the Boolean rank. Because in general, domain-lifted inference can be exponential in the size of the MLN (not the domain size), adding Formula 2 to \Delta' makes that inference can be exponential in Boolean rank, but not the size of the evidence.

- The reference for Theorem 1 is [11]. We agree with your suggestion there.