
Submitted by Assigned_Reviewer_18
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes closedform estimators for sparsitystructured graphical models, expressed as exponential family distributions, under highdimensional settings. The paper is strong in both the methodology and theory aspect. The paper is well structured and written. The paper would be even better if the authors can discuss the connection between the method in the paper with previous methods which can provide closeform estimators for special MRFs such as treewidth=1 or planar MRF. Q2: Please summarize your review in 12 sentences
A good paper Submitted by Assigned_Reviewer_25
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors provide new closedform methods for parameter estimation in graphical models. They extend a new line of work on elementary estimators to the realm of Gaussian and discrete graphical models, and derive theoretical guarantees for the estimators proposed in their paper.
The approach supplied by the authors is quite fascinating in its simplicity. Indeed, parameter estimation in Gaussian and discrete graphical models becomes a computationally challenging task for large p and n, and the closedform estimators may lead to massive speedups. The method has some drawbacks, however: One chief drawback for ElemGGM is the seemingly stringent assumption CSparse Sigma, which enforces (approximate) sparsity on the covariance matrix, in addition to the assumed sparsity on the inverse covariance matrix. The reviewer wonders whether there are settings where both types of sparsity would hold simultaneously. In the experiments, the authors do not seem to be concerned about whether Sigma satisfies the assumption, and only take care to make Sigma^{1} sparse.
Theoretical details for closedform estimators of DRMFs are also lacking. The main problem is that there is no way of knowing when assumption CLogPartition should hold; even in the literature on treereweighted variational approximations, theoretical guarantees on the closeness of the approximation to the true parameters are scarce. The authors assume that the output of (11) will be sufficiently accurate, whereas its accuracy will vary greatly depending on the true parameters, choice of rho, and graph structure. Q2: Please summarize your review in 12 sentences
The content of the paper is interesting, but the assumptions involved in validating the theory seem too strong for the scenarios under consideration. Submitted by Assigned_Reviewer_30
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Focusing on high dimensionality, this paper proposes a class of closed form of estimators for graphical models in exponential family distribution with sparsity (particularly regularized by L1 norm) structures. Using backward map and thresholding operators, the resulted estimators are simple to implement and efficient.
The paper is overall clear. The good experimental results and simplicity of implementation suggest the usefulness of the proposed method. The supplementary contains enough details.
The originality is middle, considering it extends a step further from the recent ICML'2014 paper "Elementary Estimators for HighDimensional Linear Regression".
The work is technically correct, and the results are promising on both efficiency and effectiveness. Q2: Please summarize your review in 12 sentences
This work is interesting and works well empirically. With carefully designed backward maps, this framework could be applied on models within a wide range. Submitted by Assigned_Reviewer_45
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I personally view this paper quite positively. I feel that the paper was well written, assumptions and theorems clearly stated, and (interestingly) the proofs quite simple and well written. On reading the reviewer discussion and author feedback, there seemed to be two main issues arising
(1) Too similar to earlier papers? I think not  even if the highlevel idea is similar, the devil is in the details, which need to be fleshed out to get the right rates of convergence, good performance in the experiments, and so on. If I gave someone capable the earlier paper and asked them to derive the kinds of results seen in this one, it would not be "straightforward" to extend those results  it really would need some hard work.
(2) Too harsh assumptions? This might be true but does not warrant rejection  the idea of closedformed estimators for these problems is *important* and the community deserves to be aware of these advances, especially because they have been shown to work very well practically in this paper  it really is a gamechanger compared to graph lasso or even neighborhood estimation which needs lassosolvers. I do think that further research will lighten the assumptions used (or get a better understanding of them).
The experiments are very promising, this stuff actually seems to work fast and well, and so it is possibly a case of a gap between what works and what's provable. I suspect future theory developed by other people will weaken these assumptions  people will want to work on this or extend it etc because it seems to work well.
In summary, I clearly stand on the side of acceptance. Q2: Please summarize your review in 12 sentences
While I would have been more skeptical for other papers, I feel the ideas behind this paper have the potential to be disruptive in how people use these sparse estimators in practice  there is huge scope for application and theory in the future and that makes me believe that this paper should be out there for people to read and discuss today and not 6 months from now. Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point. We thank the reviewers for their comments and feedback.
Here are the two main contributions of our submission, each being individually meaningful:
1. From an algorithmic viewpoint, we provide computationally efficient closedform estimators for graphical models. This is surprising because till recently these were solved using iterative greedy procedures, and recently via iterative algorithms for regularized convex programs. We present experimental results for the most natural settings, and interestingly, our estimators outperform compared methods both computationally and statistically.
2. From an analytical standpoint, we analyze the statistical guarantees of our closedform estimators for graphical models.
I. Distinction with the 2014 ICML paper "Elementary Estimators for HighDimensional Linear Regression":
The 2014 ICML paper above provided closedform estimators for highdimensional linear regression. Our work shares only its highlevel motivations with that paper — on deriving statistically consistent closedform estimators. However, to do so for the graphical model learning problem is considerably more complex (indeed: for linear regression under lowdimensional settings, the ordinary least squares estimator is already in closedform). Moreover, the need of closedform estimators for graphical models such as discrete or Gaussian graphical models is even more pressing, because, in contrast to linear regression, standard iterative methods for these are very expensive and can only handle limitedsized problems (and accordingly, are an active area of research in ML and Statistics).
II. Condition for the Gaussian graphical model case: Our corollary requires the condition that every row of \Sigma has bounded \ell_q norm (when q is between 0 and 1), which allows \Sigma that is not truly sparse but has many small entries. Moreover, this is just a sufficient condition; at the cost of presentational simplicity, our theorem can be shown to hold under more general conditions, which would be satisfied for instance even by merely diagonally dominant \Sigma. These conditions actually include many natural cases, for instance, the case where \Theta is chain, star, 4NN or 8NN lattice structures. We will clarify this in the final version.
It remains as a future work to find *necessary* conditions under which our closedform estimators are consistent; our closedform estimators might require possibly stronger condition than that required in standard \ell_1 regularized MLElike estimators, but provide huge computation gains. Nevertheless, statistical guarantees still hold for a quite broad class of \Theta, and we can experience of huge improvement for such \Theta, as we showed in our theorem and experiments (In the experiment, we consider the most difficult and the most natural case: randomly generated sparse graph \Theta).
Assigned_Reviewer_25:
We would like to correct the reviewer’s misunderstanding of our assumption; Corollary 2 *does not* require eq (11) is accurate. (CLogPartition) is about the accuracy of approximated logpartition function. (CLogPartition) requires approximated backward mapping of the *population* mean \mu^* is close to \theta^* with \eps error. If the approximate backward mapping is exact, then (CLogPartition) trivially holds with \eps=0. Eq (11) can be large even though the approximation is exact.
This \eps is still dependent on the variational approximations for discrete graphical models as the reviewer pointed out. Nevertheless, Corollary 2 is useful: it is completely complementary to advances in providing guarantees for variational approximations (such as treereweighted approx.); note that such guarantees are outside the scope of this paper, but are an active area of current research for varied specialized graphical models.
 