NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center

### Reviewer 1

I would like to thank the authors for their response. ============================= Originality ----------- * Although the literature in node representation learning and community detection is quite vast, this paper is positioned in a way that diverts from the norm. The method proposed is treating the two tasks as one, and jointly learns embeddings, rather than going through iterations of optimizations. * The work includes an adequate number of citations to recent and older works in the tasks at hand. Moreover, the authors proceed to compare against a big number of these methods in the experimental section. * The novelty of the work is still limited, since the techniques used are well-known and have been applied to other domains already. Still, this work is the first to apply and combine them for community detection and node representation learning. Quality -------- * The theoretical analysis in this paper is quite shallow, since the techniques that are being used are already known and established. * The experimental section of the paper is one of the strongest points of this work. It is very extensive, compares the proposed work against a variety of other methods, and the comparison is done on a number of different tasks and using different metrics. * The authors are honest in evaluating their work, especially in Table 3. They still provide a reasonable explanation for the performance of their method compared to either ComE, or the rest of the methods. Clarity ------- * The paper is overall well-written; it feels like a polished work. * There are places that could be improved in terms of details, which I will mention below. Significance -------------- * Looking at novelty, this paper's strong points are the methodological contribution that it makes through the proposed modeling, and the kind of empirical analyses that it enables. In the rest of the dimensions, i.e., theoretical and algorithmic, the paper is not adding anything novel to the literature. * The proposed model enables a unique approach to data exploration. Indeed, as showcased in the plots, the learnt embeddings can be used to produce very interesting plots for quite large and messy networks. * As I mentioned previously, this method is a significant and highly non-trivial addition to the tools for network exploration. These tools are finding more and more users and use cases, which means that the potential impact of this work is high.

### Reviewer 2

The joint analysis of node representations and community structure is an interesting and original avenue for analyzing networks. I agree with the authors that this should be much further investigated in the future, and indeed that more should be done in the networks realm in the area of integrative analysis. The comparison with current community detection and node representation algorithms was also very useful, and making the code publicly available is beneficial. The paper would have benefited from considering theoretical guarantees that are available to community detection methodologies, such as the detectability limits of the method as well as asymptotic properties of the method. Also, it wasn't entirely clear from the applications what the additional benefit was to applying vGraph over applying community detection and node embedding methods individually.

### Reviewer 3

- The paper proposes a unifying framework, named vGraph (note: there are a few other, unrelated, approaches in the literature that share this moniker), for simultaneously pursuing node representation learning and community detection. The key insight is two learn two embeddings per node, one which captures the community assignment, and one which captures the graph topology. Moreover, a simultaneously learned embedding of the communities ties these two embeddings together in the probabilistic framework. - You are implicitly defining communities based on assortativity. It would be interesting to explore whether different choices of $\alpha_{w,c}$ and $d$ in (9) would allow for other community structures (e.g., disassortative communities or core-periphery communities) to be uncovered. -In all of your examples, $K$ (the number of communities) is relatively small and the communities recovered (at least upon first inspection) appear to be relatively balanced. Is your method effective in situations with many small communities (large $K$)? - For your methodology, when an oracle $K$ is unknown, how is it chosen? - In Section 2, subsection "Community Detection," you cover two groups of methods, matrix factorization-based methods and model-based methods, and claim that the computational complexity of these approaches is high. There are a bevy of other approaches (e.g., modularity optimization methods such as the Louvain method and Clauset, Newman and Moore's greedy modularity maximizing algorithm) that scale well and show excellent performance on real data applications. Moreover, the Louvain method (from Blondel et al. 2008) runs on graphs with order $>10^8$, and thin SVD based matrix factorizations can be applied to networks with millions of nodes using only relatively standard grid-based computing resources. - In deriving your probabilistic model and ELBO lower bound, you only consider a single $(c,w)$ pair. How are you aggregating over all edges? The model effectively extracts two node embeddings, one capturing communities and one capturing the edge structure, with the community embeddings tying the two together. Are the final node embedding $\phi$, $\varphi$ or a concatenation of the two? - Please list citations for the real data networks used in your experiments in the main text. The networks cited from [1] in the supplementary do not appear to be available at [1] anymore. -On the real data networks, what is the runtime of vGraph (for example, on Dblp-full) and competing models? There is no mention of the scalability of your approach. -Can you modify vGraph to incorporate nodal attributes? - Regarding Table 3, both NMI and Modularity can be computed for clusterings with $K'\neq K$. In the setting when $K$ is unknown, how would your approach compare to approaches that automate this model selection (i.e., the Louvain method)? - Often, clustering is an inference task with multiple (equally correct) truths (see, for example, Peel, L., Larremore, D. B., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science advances, 3(5)). It is hard to judge the effectiveness of your approach based on its ability to recover a stated "ground truth," when the communities recovered by competing algorithms may be equally correct for recovering different (still true) communities. - In the classification experiment, what classifier are you using on the node-embeddings? (minor typo: vGraph is not best performing on 9 out of 12 datasets; there are only 6 datasets).