NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center

### Reviewer 1

Main comments: On the positive side, the results appear interesting, and the heuristic derivations seem highly plausible. Overall the proposed approach could be an appealing solution for community detection. On the negative side, the paper is not very easy to parse, leaving a lot to the reader to guess about notation and details of explanations. The derivations are quite sketchy, and the case for the proposed approach would be much stronger if i) additional arguments were provided to justify these derivations, and ii) additional numerical experiments were reported to support the claims (eg of gaussianity of eigenvector entries). Also, a comparison with not just (spectral methods based on the) initial version of the Bethe Hessian and adjacency matrix were reported, but also with alternative methods, eg regularized Laplacians. Further comments: -derivation of eq(5) is very quick, some justification of the iid nature of 0-1 variables summing to d_i in the DC-SBM would help. -l120, 'with the expectation still for...': do you mean you are taking a conditional expectation? If so then better use the mathematical symbol E( . |A,d_i) instead. -l180-181: is nu_i(r) the i-th smallest eigenvalue of H_r? I could not find the definition of this quantity earlier in the text. -l214-215, 'the vector u^(p)... is then associated with the p-th smallest eigenvector...': why is that? Is is explained in the supplementary material? If so, provide a pointer to where the explanation is given. -supplementary material A, evaluation of P(A|sigma, theta): you are missing an asymptotically constant multiplicative term due to non-edges. This does not matter in the end as it is absorbed in the normalization constant, but that should be corrected anyway. -supplementary material B: statement tilde(beta)=O(beta_i) does not make much sense. More generally the whole justification of approximate gaussianity of eigenvector entries in this section is too sketchy, and more arguments are needed to make a convincing case. -I did not understand how to read figure 2(b); footnote number ^4 in the caption of figure 2 does not point to anything apparently.

### Reviewer 2

This paper looks at the community detection problem when the underlying network is a degree corrected stochastic block model in the sparse regime (i.e. constant average degree as the number of nodes tend to infinity). This model has been extensively studied and the authors are still able to make a contribution. On the rigorous side, for 2 communities, [16] proves that for \alpha <\alpha_c (notation of the paper see equation (2)), it is impossible to recover the communities better than random guessing and [17] shows that for \alpha>\alpha_c, a spectral method based on the non-backtracking matrix will achieve positive overlap. The authors in [8] suggest to use a symmetric matrix that they named the Bethe Hessian instead of the non-backtracking matrix and demonstrate empirically that a spectral algorithm based on this matrix works very-well. In order to use the Bethe Hessian, on needs to specify an additional hyperparameter called r compared to the non-backtracking matrix. A default choice was proposed in [8] and the paper under review suggests another generic choice for this parameter. With this new choice of the parameter, the spectral algorithm gives a better overlap for the degree corrected SBM. Some mathematical intuitions for this new value is given in section 2. Ways to estimate it are described in section 3. Extensions to several uneven-sized communities are provided in section 4 and experimental validation is done in section 5. -------------- Post feedback: I've read the feedback of the authors. A point I realized in the discussion with other reviewers: I doubt the overlap obtained by your spectral method will be 'optimal'. Usually spectral methods are used as initialization for a greedy method improving the overlap. The only optimal result with a rigorous proof that I know is from Mossel Neeman and Sly in Belief propagation, robust reconstruction and optimal recovery of block models where they show that running a variant of BP with a good initialization is optimal well above the threshold for SBM. Hence your remark 2 is probably not correct and it would be nice to have experiments with BP.

### Reviewer 3

This paper studies clustering/community detection in the stochastic blockmodel with degree inhomogeneous vertex degrees. Major advances have been made in recent years on optimal algorithms for clustering in sparse stochastic block models, including a series of works analyzing belief propogation methods and developing spectral algorithms based on nonbacktracking random walks. This paper studies a family of spectral methods, collectively called the Bethe Hessian, for community detection in sparse graphs sampled from the *degree-corrected stochastic block model*, a variant of the stochastic block model which allows for a variety of degree distributions. High degree vertices typically affect the spectrum of the adjacency matrix by introducing spurious eigenvalues, and so other spectral methods are needed to correct for the degree inhomogeneaty. The main result of the paper is a detailed heuristic derivation (somewhat statistical physics inspired (as is common in this area)) of the optimal choice of hyperparameter for the Bethe Hessian spectral method. This derivation is validated with some experiments on real-world networks. I have mixed feelings about this paper. I do not think that it can be accepted on the basis of the theory it presents -- while theoretical motivations for hyperparameter settings are important, I think on its own this is too narrow for NeurIPS acceptance (and it is a demerit that the arguments are not quite rigorous). On the other hand, it does propose an algorithm which seems to beat the state of the art for community detection in some real data -- clearly this is an important benchmark. I think the paper would be substantially improved if the experimental aspect were more deeply explored -- it would be good to compare on real data to e.g. the nonbacktracking operator, and also to demonstrate empirically that the proposed paramter setting for the Bethe Hessian is optimal, rather than only exhibit experiments on one other possible parameter setting. UPDATE: I have read the author response. I am glad to hear that experiments with other values of $r$ will be added to the paper. Thanks for pointing out that there exists a value of $r$ such that clustering with the Bethe Hessian should act like clustering with B; even looking at ref [8] briefly it is not clear to me how formal this relationship is.

### Reviewer 4

I think this is an interesting paper, establishing a choice of the parameter in the Bethe Hessian matrix that is threshold-optimal for the sparse degree-corrected block model. This is novel up to my knowledge. The random matrix theory (establishing the value of the overlap) behind the work is interesting and non-trivial. I see two slight drawbacks: ** The paper is not very clear about the comparison with performance of the non-backtracking matrix spectral method for the same problem. From my understanding that method has the same threshold. Does it have a worse overlap? I think so, but could not find a clear statement about it in the paper. ** Remark 2 and related appendix in SM seems non-sensical to me. The authors use the works Bayes-optimality and Nishimori condition in a way different from the literature to which they refer without defining them properly. Clearly a spectral method will not be able to reach a Bayes-optimal overlap, such as degree-corrected belief propagation is conjectured to reach. The derivation in the Appendix is more than heuristic. The authors end up with a mapping into an Ising model that would not even maintain the right sizes of the groups if exact sampling was implemented for it.