
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents an algorithm for Bayesian network learning.
In contrast to exact methods, this method can handle settings with thousands of variables, without restricting the number of parents of each variable. The method is composed of two parts, a novel parent selection algorithm and an improvement on an existing structure optimization approach.
One of the weak points of the paper is the parent selection heuristic presented because there are no guarantees that the optimal parents sets are a subset of the parent sets that are chosen.
It's shown in practice that their approach outcompete other methods of parent selection when comparing the score functions. However, the experiments were done on realworld data sets and thus there is no groundtruth structure to compare with.
It's not clear whether their method achieves the correct structure or not.
It would be more convincing to show that the method recovers the correct parent sets on simulated data whether the true structure is known. In addition, the description of the independence selection algorithm is difficult to follow. It would have been helpful if there was some analysis on how many parent sets are explored, and then how many of them end up in the closed list.
Regarding the improvement on orderbased search for structure optimization, the paper presents a clever way of considering the extra acyclic structures which orderbased search would have ruled out, while maintaining the same overall time complexity.
For this section, it was unclear how the initial ordering over the variables was determined. The experiments demonstrate that their method outperforms Gobnilp (an exact approach) and Orderbased search for datasets with number of variables greater than 500 for structure optimization based on the BIC score.
Again, the results could be different if evaluating whether the correct structure is recovered.
Q2: Please summarize your review in 12 sentences
This paper proposes an novel heuristic method for learning the structure of the Bayesian network specifically in cases where the number of nodes prevents the use of exact approaches. However, the paper is not clearly written and the experiments are not quite comprehensive.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Update: I have read the authors' rebuttal. I recommend that the authors try to enhance the presentation, if the paper got accepted.
The paper proposes a novel approximation to the BIC score that can guide the search for finding a good parent set. The benefit of the proposed approximation is discussed both theoretically and practically. The second contribution was an enhancement over current structure optimization.
In my opinion, the paper is very simple and could have been described much more succinctly and more clearly. The first contribution is nice in terms of pruning some parts of the search space. For the second contribution, we theoretically only know that it will be at least as good as the current structure optimization?
A good paper overall. My main criticism is the unnecessarily verbose explanation for a very simple, yet useful contribution.
Q2: Please summarize your review in 12 sentences
The paper proposes an approximation for BIC that can guide the search for a BN structure. The paper could have been written succinctly in a much better way. The modification of the structure optimization is incremental.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper describes tricks to scale Bayesian network structure learning to thousands of variables. This is achieved by developing new heuristics for candidate parent set identification and the subsequent order based structure optimization.
In general, the paper is clearly written and easy to read. There are issues in editing and style, but the problems do not affect readability (much). The suggested heuristics feel bit adhoc, thus the value of the work is eventually judged by empirical evaluation. In this regard, I feel that the work could be better. Notably, the comparisons to simple local greedy searches (that scale well to 10000 variables too) should be included or referred to.
In the following, I will give more detailed comments:
 Intro, second paragraph: "Usually"  I do not know where this
"usually" comes from, but for me it is annoying. It also occurs in
many other places in a text, and more often than not it provokes a
counterreaction. Yes, such a two phase strategy has been used
before, but to say it is "usually" done this way is a statement that
has to be proven right.
 Page 2, second paragraph. It is inconsiderate to use citation as a
partofspeech as in [17]. Even for an expert reader, that requires
stopping reading and looking for the reference.
 Page 2, the BIC formula looks a bit ugly. It would have been better
to do $BIC(G)=\sum_{i=1}^n BIC(X_i,\Pi_i)$, and
$$
BIC(X_i,\Pi_i) =...
 Page 3, 3.1. The explanation makes the reader suspicious. It is
exactly the nonmonotonic nature of the effect of combining the
parent sets that makes the structure learning hard. How would BIC*
manage with Z=X xor Y (or more generally a parity function). But, well,
the experiments will tell.
 Page 5, Which data was used to produce Fig 1.
 Page 5, while the space of orderings is indeed less that space of
DAGS, 1000! is still quite a big number, so calling it "convenient"
is a bit of a stretch.
 Page 6, 5.1. Sample sizes would have been nice to have too. Maybe n/N in
Table 1.
 Page 7, last paragraph. Gobnilp is said to be better because it is exact
solver. Now, it is exact solver also when n > 500, so how can it all
of the sudden be worse? I mean, the explanations as they stand now,
are not very convincing.
 Page 7. How did you set the structure and parameters for random networks?
All in all, it is a rather simple paper and I find it interesting to scale the BN structure learning to 10 000 variables. I really would have liked to see a better empirical comparison with heuristic methods like greedy local search also with big enough sample sizes that allow a big number of parents for the optimal network.
Q2: Please summarize your review in 12 sentences
In general, the paper is clearly written and easy to read. There are issues in editing and style, but the problems do not affect readability (much). The suggested heuristics feel bit adhoc, thus the value of the work is eventually judged by empirical evaluation. In this regard, I feel that the work could be better. Notably, the comparisons to simple local greedy searches (that scale well to 10000 variables too) should be included or referred to.
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents a heuristic for learning Bayesian networks with very large number of variables. The contributions are an adaptive heuristic algorithm for choosing parent sets whose score is computed and an improved version of orderingbased search.
While it is good to compare the method to exact methods, the main competitors of this method are other heuristics such as greedy search, GES or MMHC and it should be compared to those, too.
Q2: Please summarize your review in 12 sentences
This paper introduces a new heuristic for learning Bayesian networks for very large data sets. The method seems to perform well in practice. However, it was not compered to some of the obvious competitors.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
Assigned_Reviewer_2
"One of the weak points
of the paper is the parent selection heuristic presented because there are
no guarantees that the optimal parents sets are a subset of the parent
sets that are chosen. [...] It's not clear whether their method achieves
the correct structure or not."
=>Sect 5.2 presents experiments
on data simulated from known networks. Although we do not report on the
distance between the actual network and the networks learned by the
different methods, we did measure the BIC score of the learned networks,
as done in the literature. Assuming all networks to be equally probable a
priori, the BIC score is proportional to the posterior probability that
such network is the actual generative model. The networks learned by our
novel approach have consistently higher BIC score (thus higher posterior
probability of being the correct model) compared to the networks learned
by the competitors. Moreover the graph of a Bayesian net (BN) simply
encodes conditional independencies. How close a learned structure is to
another (or to the true one) is not an obvious question; simply checking
for matching arcs does not reflect the distance between the joint
distributions.
"It would have been helpful if there was some
analysis on how many parent sets are explored [...]"
=>We
provide some hints on how the parent sets are explored in Figure 1. It
shows that our methods explores first the highestscoring parent sets.
Conversely, sequential ordering is unable to prioritize the
highestscoring parent sets. We can provide some additional examples and
further empirical results in the final supplementary
material.
Assigned_reviewer_3
"Notably, the comparisons to
simple local greedy searches (that scale well to 10000 variables too)
should be included or referred to. [...]"
=>The networks learned
by our methods have consistently higher score than the networks learned by
both Gobnilp and the orderingsearch by Teyssier and Koller UAI'05
(T&K'05). The orderingsearch of T&K'05 is a heuristic method
which improves over the methods mentioned by the reviewer. Indeed
T&K'05 show that their search yields higher score than (or equal to)
both greedy hillclimbing over the space of graphs and
optimalreinsertionsearch for BN structure learning (Moore and Wong,
ICML'03). Gobnilp is a stateoftheart exact solver, and its approximate
anytime results are anyway a relevant benchmark.
Our approach does
not need a max indegree (k) but its competitors do. We set a limit k=6 to
fairly assess the different methods. The limit k=6 is already much larger
than current practice and state of the art. For instance the results
reported on Gobnilp webpage adopt k=2 when dealing with more than 60
variables.
"Gobnilp is said to be better because it is exact
solver. Now, it is exact solver also when n > 500
[...]"
=>Gobnilp is an optimization package for learning the
structure of BN with anytime behavior, based on cutting planes and mixed
linear integer programming. It finds an approximate solution if stopped
early. In our experiments, Gobnilp is always approximate when provided
with more than 500 variables, even though we allowed 24 hours of
computation.
"Which data was used to produce Fig
1"
=>Data from the child network, n=5000. It regards the
exploration of parent sets for variable 'Disease'
"How did you set
the structure and parameters for random networks?"
=>(page 8 of
the paper) Each variable gets a set of parents with random size from 0 to
6 (uniformly). Parameters were sampled from a Dirichlet distribution with
random parameters in (0,10).
Assigned_Reviewer_3
"we
theoretically only know that it will be at least as good as the current
structure optimization?"
=>We theoretically know it will be
better than T&K'05. We have experimentally seen it is also better than
Gobnilp, which is arguably the state of the art for structure
learning.
Assigned_Reviewer_5
"While it is good to compare
the method to exact methods, the main competitors of this method are other
heuristics such as greedy search, GES or MMHC and it should be compared to
those, too."
=>(Please see also reply to Assigned_reviewer_3) We
find consistently higher scores than both Gobnilp and T&K'05; the
latter is shown to improve on other methods (M&W ICML'03). Moreover,
we focus on scorebased learning, so comparing to methods based on
independence tests is not ideal.
Assigned_Reviewer_6
"missing results of section 5.2 should
be added to the supplemental material"
=>We will provide these
detailed results as supplementary material. Moreover, we plan to release
also our software as open source.
Assigned_Reviewer_7
"The
result looks promising, but all based on BIC score."
=>With
slight modifications our proofs about the BIC score can extended to AIC
and to the MIT score (De Campos, 2006, JLMR). We leave instead for future
work the extension to alternative scores such as BDeu.

