
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary:
The paper provide a new nonparametric model for estimating the structure of a probabilistic graphical models considering infinite dimensional exponential family (based on infinite dimensional reproducing kernel Hilbert space). Their estimation is based on score matching which avoids intractable/expensive computation of of the log partition. They show that the estimator can be solved using any current algorithm for solving group lasso. They also show the theoretical consistent analysis of the estimated sparse structure using primalwitness approach.
Qualitative Evaluation
Overall, not a well written paper. This paper does not provide a good review of other related tools based on score matching. Authors provide the theoretical analysis for the claims in the paper but they do not compare their results with existing approaches  a section on related work would have been helpful. The analysis/method in the paper is mainly based on [28] but authors do not discuss the relationship of their work with [28], they just cite this paper as they directly used some of their results without any comments.
It is not clear how good/bad is their consistency analysis against existing papers in the literature. Although not discussed in detail in the main paper, the nonasymptotic rates of convergence seem much worse than the parametric setting, or even the semiparametric setting as in [18]. So, with finite samples, it is unclear when/why the method should be used. Again, there was no discussion in the paper to illuminate this or any other point.
The problem addressed in the paper is an interesting problem and a challenging problem. The model is based on score matching which is already addressed in [28], but they provide the connection with group lasso which provides a good solver for their model. However, currently algorithm only works with l1 norm and it is not clear if this model can be applied with other regularizers to capture different structure and if it can be done, how challenging is the solver. The consistency analysis is based on primaldual witness which can be difficult to be extended to other regularizers.
Q2: Please summarize your review in 12 sentences
The paper provide a new nonparametric model for estimating the structure of a probabilistic graphical models considering infinite dimensional exponential family. The paper is not well written  while the results look potentially interesting, no comparison is done with existing methods in terms of the technical results. It seems the nonasymptotic rates obtained are worse than certain existing results. The paper is mainly based on [28], so a more clear exposition on the novelty will be helpful.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper learns a nonparametric graphical model via score matching, which effectively bypasses the need to estimate the normalizing constant. Theory and simulations were provided.
Quality: The nonparametric graphical model and the neighborhood estimation method are not entirely new. However,
what is interesting is that the need for computing the normalizing constant is bypassed by using score matching. Parametrized by RKHS, the formulation in (8) is shown to achieve neighborhood selection consistency.
Clarity: The paper is well presented.
Originality and significance: The paper is original in that using score matching, one does not need to evaluate the normalizing constant.
Q2: Please summarize your review in 12 sentences
The score matching objective proposed in the paper for learning nonparametric graphical models is free of normalizing constants. This is an interesting idea.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary: The paper presents a novel nonparametric procedure to estimate the structure of a sparse undirected probabilistic graphical model from data. It is based on using kernels to approximate the resulting distribution by an infinite dimensional member of the exponential family, and uses the RKHS representer theorem to reduce the corresponding optimization problem to one that can be efficiently tackled using existing group lasso solvers. It is well written, clearly explained, and utilises at times quite challenging maths to arrive at an elegant solution. The ideas and techniques developed in this paper should be of interest to many researchers in the field of structure learning. Only weakness is evaluation section, which should have included more complex graph structures with weaker edges to give a better indication of performance in practice.
Quality: Good idea, well executed. All steps in the reduction of the (penalised) score matching objective to the quadratic minimization problem make sense, although I did not check the detailed proofs in the supplementary material, and the abstract parameterisation in (12) is quite difficult to follow.
As stated above, only weakness is the evaluation: the (very sparse) chain graph used in the examples in combination with edge strength \Omega_s = 0.4 (line 359) makes for rather easy structures. As a result the performance in FIg.1 is already very good at N=60 data points to nearly perfect at N=100 ... something that is very atypical for realworld structure inference problems. I would have preferred to see the method evaluated on more realistic models to get a better impression of the actual performance obtained by the method.
Clarity: Very good: problem + main idea is well described, concepts and formulas are carefully defined, and the paper does in good job in trying to make the overall argument accessible to most readers. For example, section 2.3 was compact but very helpful being less familiar with score matching. However, some of the steps in 3 could still do with a few hints to make it easier to follow.
Originality: The paper obviously builds on many existing results, but as far as I can tell the crucial contribution culminating in Theorem 3.2 is new, elegant, and hopefully very useful.
Significance: Significant addition to stateoftheart methods in structure inference. I expect it to lead to various new ideas and applications.
Details: 53: check sentence 'we focus ... models,' 76: typo > 'integrable' 98: typo > 'realizations'
128: typo 'RHKS' > RKHS' 202: rephrase 'an additive assumption on' 222: currently A.2 is formulated as an incomplete assumption
314: 'and D is diagonal' > I did not get this from the definition in Theorem 3.2 358: for more realistic models sample \Omega_s from e.g. [0.050.8] 407: explain how exactly the NPN ROCcurve in Fig2b is obtained (what different objective is optimized in comparison to glasso and SME) 419: typo > 'nonparametric' also: 'for learning the structure of a ..'
Q2: Please summarize your review in 12 sentences
The paper presents a kernel based nonparametric procedure to efficiently estimate the structure of sparse undirected probabilistic graphical models from data using existing group lasso solvers.
It is highly relevant, well written, and utilises at times quite challenging maths to arrive at an elegant solution: clearly belongs in the conference.
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I have read the authors' rebuttal and it did not change my opinion of the paper.  1 Summary of the paper:
This paper proposes to learn the structure of a kernelbased graphical model encoding a joint probability distribution (probabilities belonging to an infinite dimensional exponential family[1]). By using scorematching as the objective of the learning procedure, the normalisation constant does not need to be estimated, which is scalable and sufficient to learn the structure. Moreover, because of the kernelbased parameterisation, the optimisation problem reduces to an instance of the group lasso. The structure recovered by the algorithm is asymptotically consistent. The method proposed is compared to the graphical lasso to estimate a gaussian distribution and to both glasso and an algorithm knowing the model family on a paranormal distribution
[1] S. Canu and A. Smola. Kernel methods and the exponential family. Neurocomputing, 69(7):714720, 2006.
2 Summary of the review: While I did not verify the theoretical analysis provided in the paper and the experiments could be explained better, I find the paper well organized, the contributions reasonably clear, the idea very interesting and potentially impactful.
3 Qualitative evaluation
Quality: medium + extensive theoretical analysis (not checked fully, seems sound) + small scale empirical evaluation  no discussion or empirical evaluation of how to choose the kernel parameters. Is a gridsearch + crossvalidation feasible and sufficient? How sensible is the result to the parameters chosen?  no code provided
Clarity: low + well organised paper, clear language + lots of mathematical developments that are not easy to follow, especially with notations not necessarily introduced in the main paper.
 the parameters of the second model learned in the experiments are not provided.
Originality: good + I am not familiar with kernelbased methods or kernelbased graphical models, but I could not find really similar works beyond the ones cited.
+ significant development with respect to previously published cited work  still, other works using kernel with PGMs exists. It could be interesting to relate the paper under review to them, for the informed reader like myself. For example, a quick search on google yielded the following papers: "Nonparametric Tree Graphical Models via Kernel Embeddings", "Kernel Conditional Random Fields: Representation and Clique Selection", "Learning Graphical Models with Mercer Kernels".
Significance: good + structure learning for nongaussian variable is an important open problem + I find the proposed idea very interesting
Impact: 2
4 Other Comments:
Details: . I understand you focus on the structure estimation problem, but can the normalisation constant be estimated easily enough when the model has been learned if one also wants the density? Are there some restricted families of kernel that make it possible? . I think section headers should not be all capitalized. . I am not familiar with RKHS. So I could not easily go from l204 to l207. I would suggest deriving it in the supplementary material. . a table of all notations in the supplementary material would make the paper a bit easier to follow. . l 319: why is it Omega(f)^2? (was Omega(f) in equation 8) . experiment 2: what is the dimensionality of the problem? What are the parameter values? . it would be nice to have the expression for D, E and F in theorem 3.2, for example in the appendix.
typos: . l25: evaluation of > the evaluation of . l53: "the" structure (lots of similar missing "the" in the paper  though I'm not a native speaker.) . l 76: integrable . l86: H_2?
. l233: S^c not defined . l323: Omega*_{S^c} not defined
Q2: Please summarize your review in 12 sentences
While I did not verify the theoretical analysis provided in the paper and the experiments could be explained better, I find the paper well organized, the contributions reasonably clear, the idea very interesting and potentially impactful.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their thoughtful comments
that will lead to improved quality of the manuscript. We will revise our
manuscript accordingly to make it more selfcontained. We will also
improve introduction and add more content on related work. Below, we
address the major concerns.
Reviewer 3
Thank you for the
additional references. There is a substantial body of work on learning
densities in a nonparametric form. Our procedure is related to this line
of research. Novel aspects of our approach are that we consider a
particular form for the density and use score matching as a divergence. We
will provide more extensive literature review in the revision.
Our
computation procedure boils down to solving a grouplassolike objective.
There are a number of scalable implementations that can be used. Grid
search plus crossvalidation is one way to select the tuning parameter.
According to our experiments, the procedure does not seem to be
particularly sensitive to the choice of the tuning parameter. We will
release our source code on GitHub and present the URL in the revised
manuscript.
Computing the normalizing constant is a difficult
problem even after the underlying structure model is inferred. When the
dimension is low, it would be possible to estimate the normalizing
constant via numerical integration, regardless of the kernel you are
using.
Thank you for reading the paper in details and providing
comments on how to improve the presentation.
Reviewer
4
Thank you for the detailed comments. We will carefully
incorporate them in the revision.
Reviewer 5
Thank you for
pointing out the relevant references. We will discuss about them in
section Related Work. Our approach is similar to the references only in
the sense that both use a nonparametric approach to estimate the
undirected graph structure. However, our approach is substantially
different from the references. First, our method is based upon score
matching, which is then optimized by group lasso, while the references use
the logdensity ANOVA model, which requires computing the normalization
constant once. Second, we provide a detailed proof about sparsistency
property and convergence rate of our approach, while the references do not
have such a result.
Reviewer 6
We will revise our
manuscript to better explain our novelty and its relationship with other
work.
Our established convergence rate is worse than that for a
parametric model. However, our approach applies to a much larger class of
models, not only the Gaussian/nonparanormal models. The method in [28]
looks similar since it also uses score matching. However, the optimality
conditions (prop 4.2 in our paper and Theorem 4 in [28]) of two objective
functions are different, which leads to different algorithms and
statistical analysis. We have cited all the relevant results in
[28].
Reviewer 7
Thank you for your comments. We will
significantly improve our manuscript to make it much easier to read.

