NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 5716 Semi-Implicit Graph Variational Auto-Encoders

Reviewer 1

This paper proposes a Semi-Implicit VI extension of the GraphVAE model. SIVI assumes a prior distribution over the posterior parameter, enabling more flexible modeling of latent variables. In this paper, SIVI is straightforwardly incorporated into the Graph VAE framework. The formulation is simple but possibly new in the graph analysis literature. It is easy to understand the main idea. The proposed model shows good records in link prediction experiments. Fig. 3 is not reader-friendly in several aspects (i) the panels are simply too small. (ii) we can observe the posterior distributions learned by SIG-VAE is multi-modal. But the readers do not know that the posteriors of five nodes should be'' multi-modal. In other words, the SIG-VAE's variational posterior is closer to the true distribution, than that of VGAE? Are there any solutions that can answer this question more directly? I cannot fully understand the meaning of the result of graph generation experiments. What is the main message we can read from this result? I have a few questions concerning the result of the graph node classification experiment. (i) what kind of data splitting is employed in the experiments? (train/validation/test sample splitting) Data split has a huge impact on the final score. The split is the same with the standard split used in the Kipf-Welling's GCN paper? (ii) The performance of the proposed SIG-VAE is not so much impressive, compared to naive and simple GCN. Why is that? (iii) I think GCN is a strong baseline but not the best one to claim SOTA. In general, the [GAT] works better in many cases, including the cora and citeseer datasets. Please see the experiments in [Wu19]. [GAT] Velickovic+ Graph Attention Networks'', in Proc. ICML 2018 [Wu19] Wu+, Simpplifying Graph Convolution Networks'', in Proc. ICML 2019 + A combination fo Semi-implicit VI and graph VAE is new + Formulation is concise and easy to understand - Some unclear issues in Fig.3 and graph generation experiments, - The node classification result is not SOTA (too-strong claim) ### after author-feedback ### The authors provided satisfactory answers for some of my concerns. Considering the other reviewers' points of view at the same time, I raised the score.

Reviewer 2

Originality The paper is a combination of a number of ideas in the literature, where a careful combination of existing techniques leads to really good representation learning for graphs. In that sense the work is original and interesting. -----POST REBUTTAL----- I thank the authors for addressing my concerns / questions around a VAMP version of VGAE as well as questions around Eqn. 5. In general the rebuttal seems to include a lot of relevant experiments for the concerns from the review stage, and based on this evidence I am happy to keep my original score for the paper. Clarity The paper is generally clear and has clear mathematical formulations written down for all the methods considerered. Quality The paper has a number of thorough experiments and generally seems to be high quality in empirical evaluation. It also has a clear intuition for why the proposed method is better and extensively demonstrates and validates it. Significance The paper seems like a significant contribution to the graph representation learning literature. Weaknesses - It would be good to better justify and understand the bernoulli poisson link. Why are the number of layers used in the link in the poisson part? The motivation for the original paper [40] seems to be that one can capture communities and the sum in the exponential is over r_k coefficientst where each coefficient corresponds to a community. In this case the sum is over layers. How do the intuitions from that work transfer here? In what way do the communities correspond to layers in the encoder? It would be nice to beter understand this. Missing Baselines - It would be instructive to vary the number of layers of processing for the representation during inference and analyze how that affects the representations and performance on downstream tasks. - Can we run VGAE with a vamp prior to more accurately match the doubly stochastic construction in this work? That would help inform if the benefits are coming from a better generative model or better inference due to doubly-semi implicit variational inference. Minor Points - Figure 3: It might be nice to keep the generative model fixed and then optimize only the inference part of the model, parameterizing it as either SIG-VAE or VGAE to compare the representations. Its impossible to know / compare representations when the underlying generative models are also potentially different.

Reviewer 3

The paper is incremental work compared to the Semi-Implicit VAE [38]. The general idea of the SIVAE is to model the parameters of the VAE model ($\psi$) as a random variable that one can sample from but does not necessarily have an explicit form which results into a mixture like behavior for the VAE. In this work, the authors propose to use that framework for Graph data. The idea is to sample from $\psi$ and concatenate with each layer of graph VAE. The rest follows [38]. They also propose another variant based on the normalized flow which read a bit out of sync (afterthought/add-on) with the rest of the paper.