Submitted by
Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper introduces a probabilistic model for
networks which assigns each node in the network to multiple, overlapping
latent communities. Inference is done using a stochastic variational
method and the experimental evaluations are performed on very large
networks.
The first thing I note is that you do not cite Morup et
al. (2010) "Infinite multiple membership relational modelling for complex
networks", which in truth was the first work to perform inference for a
latent feature relational model on large datasets  in effect, rendering
your statement on 067068 "... these innovations allow the first..."
incorrect. This is a rather serious oversight, because their paper not
only performs large scale inference, but their method is also an MCMC
method, which is wellknown to usually produce more accurate results than
variational methods.
I believe the strongest contribution from
this paper is the application of a stochastic variational inference method
to a relational data model. This aspect is likely to have the highest
impact. This paper provides a new choice in the toolbox of probabilistic
inference for relational data models, and this is always highly welcome,
especially since the inference is shown to be viable on massive datasets.
It is primarily for this reason that I will recommend the paper to be
accepted.
I also think the experiments are very well done. It is
clear that great care was taken to execute the large experimental setups
well, and that the authors thought about the various engineering choices
that needed to be made to get the inference to work. I don't think these
engineered choices are all that elegant, but having these procedures
(well) outlined in a paper will certainly do service to the network
modelling community.
Also, I don't see how this inference method
is "online" in any sense. The pruning moves are your solution to
automatically infer the number of communities/features K, which is already
done with the other nonparametric models for relational data. Perhaps I am
missing something, but I think the online aspect should be removed from
the paper. Indeed, I don't really even see where in the paper any aspects
making it particularly "online" are discussed. Q2: Please
summarize your review in 12 sentences
In summary, the paper by Morup et al. (2010) should
have been cited. The main contribution of the paper is the addition of a
stochastic variational method to the inference toolbox for largescale
relational data. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper addresses an important issue: Inference for
Bayesian Nonparametric Relational Models for large networks. The
authors use an assortative HDPR model to recover latent communities in
social networks previously examined with the mixedmembership stochastic
block model and demonstrate an improvement of perplexity scores and link
prediction accuracy. They also use their community structure to visualize
business and governmental relationships extracted from an existing
database. Their Pruning Moves seems to be appealing and productive in
large networks. They also provide a logical approach for determining
candidates for pruning, which I like. I have only a minor concern on
selection of t*? Will the results be sensitive to the choice of t*?
Perhaps authors can briefly extend the discussion alone this.
Q2: Please summarize your review in 12
sentences
The authors presents an interesting novel approach for
Inference for Bayesian Nonparametric Relational Models. The approach
is very useful in large networks. Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents a method for learning a
nonparametric, mixedmembership model of relational data. The model is a
straightforward extension of the aMMSB using an HDP. They combine recent
advances in stochastic variational inference with novel pruning strategies
geared for this particular application.
The authors describe some
of the care in creating an efficient implementation for this particular
model (ensuring linear time and space complexity).
Lingering
questions:  Why not compute aMMSB for a wide range of K and compare
their aHDPR to aMMSB's best performance? It would be nice to confirm that
aHDPR is obtaining a very good result with far less computational effort
than fitting aMMSB for a wide range of K.  It would be nice to see
the role of epsilon on the predictive accuracy of the model, though they
mention that they leave it for future work. For that matter, discussion
regarding robustness to other hyper parameters was quite limited as well.
 Why only analyze a single connected component for each network? Both
models should be able to quickly identify the nonoverlapping nature of
these groups.  Where is the model making mistakes? Is it for nodes
with low degree? or for nodes that are part of groups that get
(accidentally) pruned?  How does this predictive performance compare
to latent feature relational models?  I found their strategy of
choosing samples to use for their pruning rule insightful and interesting.
Can these pruning techniques be extended to other nonparametric relational
models (eg. LFRM)? Why or why not?  I also found their use of
nodespecific learning rates intriguing. For how much it appears to matter
in the results, there is a very limited amount of discussion and insight.
 It would be nice to see aMMSB fit to a wide range of K and make sure
that aMMSB at least *can* achieve the same perplexity as the proposed
model.  The visualization does not really emphasize the
mixedmembership nature of the model. Q2: Please
summarize your review in 12 sentences
The paper provides a nonparametric extension to a
mixedmembership model for relational data, and presents a stochastic
variational inference algorithm paired with pruning moves. Several
techniques are presented that may be applicable for extending related
models.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their helpful comments. We
emphasize that there are a number of novel contributions in this paper,
including: the proposal of an HDP relational model; the adaptation of
nested variational truncations (developed for HDP topic models) to this
case; the design of lineartimeandspace structured variational updates
for assortative likelihoods; the development of effective online methods
for pruning extraneous communities; and careful experimental validation
showing major improvements relative to parametric mixed membership models.
RELATED WORK (R4): We thank R4 for pointing out a missed
citation to Morup et. al, "Infinite Multiple Membership Relational
Modeling for Complex Networks". The paper develops a Bayesian
nonparametric mixedmembership latent feature model via the Indian Buffet
Process (IBP) that scales to networks that are similar in size to our
paper (via MCMC inference and likelihood approximations). We will add a
citation, and remove the claim that our approach is the first scalable
Bayesian nonparametric relational model. We nevertheless note a number of
substantial differences from this work: we propose a novel HDP (rather
than IBP) relational model, use stochastic variational inference rather
than MCMC, and our experiments show scalability with hundreds of learned
communities.
PRUNING MOVES (R5 & R6): R6 asks if our
pruning moves can be extended to other nonparametric, latent feature
relational models. We believe this may be possible: while we look for
communities where the variational Dirichlet distributions have low total
mass, binary IBP models could identify features whose variational beta
distributions have low total mass. Of course, details would need to be
worked out, and this is not the focus of our paper.
LEARNING RATES
(R6): We agree that more discussion of nodespecific learning rates
should be included. While the correctness of this approach is established
by the arguments in the submission, it is less obvious why performance
should improve. We believe this is related to the variance of different
components of the (stochastic) gradient vector, which will be large for
nodes that are poorly represented in the batch. A more formal analysis of
this phenomenon is ongoing work.
EXPERIMENTS (R6): Regarding
the choice of the number of communities K for the aMMSB, note that we
already include a fairly extensive comparison in Figure 2. For the
largerscale experiments in Figure 3, computational constraints forced us
to focus on a narrower range of options. Note that even with a "good" K,
the aMMSB may (and often does) have worse predictive performance than the
aHDPR, because the models have nonequivalent hierarchical community
membership priors.
Our decision to analyze the largest connected
component of each network allows us to focus our quantitative comparisons
on the challenging problem of identifying mixed community memberships for
"active" nodes with many neighbors. For the small disconnected cliques
found in realworld networks, all models easily assign them to unique
communities but have little information to use for prediction.
HYPERPARAMETERS (R5 & R6): R5 asks about sensitivity to
the pruning threshold t*, and R6 about sensitivity to the
betweencommunity threshold parameter epsilon. Currently, both parameters
are set to be "very small" (and in the case of epsilon, consistent with
prior work on the aMMSB); performance for a wide range of other "small"
values is similar. It could potentially be useful to increase t* to allow
more rapid pruning and greater speed, or increase epsilon if less
assortativity were expected. But given time and space constraints, we
focused on other issues in our experiments.
TERMINOLOGY (R4):
Our usage of the term "online" was motivated by prior work in this
area, for example Hoffman et al.'s "Online learning for latent Dirichlet
allocation". We agree the word "stochastic" is also relevant, and will
consider potential changes.
