__ Summary and Contributions__: The paper proposes a topic model for networks with text on the nodes (in the vein of the relational topic model). The main innovation is to include a hierarchical structure via a deep probabilistic model by stacking gamma-Poisson factor models, with inference achieved by a sophisticated VAE approach, Weibull graph autoencoders.

__ Strengths__: Although there has been a lot of work on modeling networks with text data via joint topic modeling and network modeling, the use of a deep model in this context is new to this work (as far as I know). This is a useful advance enabling the recovery of complex yet somewhat interpretable models of networks with text on large datasets, with a moderate degree of novelty.
The paper builds toward its final approach in several increasingly complex models and inference algorithms, covering a substantial amount of work as a contribution.
The proposed model is carefully and insightfully designed to admit tractable and scalable inference. The inference algorithm is principled and leverages recent innovations.
This paper is relevant to the NeurIPS community.

__ Weaknesses__: Results were mixed against the strongest baseline, SIG-VAE on the link prediction task, which is arguably the most important evaluation task, performing only slightly better on one out of the three datasets and slightly worse on the other two. The authors claim that the memory requirements are much lower than for this baseline, with further discussion in the appendix. This is fine but the authors should then report hard numbers on memory usage etc to back this claim up. Similarly, the paper states that SIG-VAE "requires an unaffordable memory footprint" but evidently were able to afford it, since they obtained results with that method.

__ Correctness__: As far as I can tell, the claims are correct. I have not verified the finer points of the derivations but it seems fine at a high level. The evaluation is reasonable, with a broad set of baselines and several datasets. Details on how hyper-parameters were selected were in the supplementary - these should probably be moved to the main paper.

__ Clarity__: Yes, the paper is generally well written and well argued. There are a few typos which I will list below.

__ Relation to Prior Work__: While the paper mostly does a good job of contextualizing relative to prior work, there are a couple of issues. There is a substantial literature on jointly modeling networks and text with topic models that this paper only touches on. It would be important to add a section on this, even if it is in the appendix. See for example:
Guo, F., Blundell, C., Wallach, H., & Heller, K. (2015, February). The Bayesian echo chamber: Modeling social influence via linguistic accommodation. In Artificial Intelligence and Statistics (pp. 315-323).
He, X., Rekatsinas, T., Foulds, J., Getoor, L., & Liu, Y. (2015, June). HawkesTopic: A joint model for network inference and topic modeling from text-based cascades. In International conference on machine learning (pp. 871-880).
Zhang, X., & Carin, L. (2012). Joint modeling of a matrix with associated text via latent binary features. In Advances in Neural Information Processing Systems (pp. 1556-1564).
Similarly, a few important references to gamma-Poisson factorization models for text or networks should be added:
Canny, J. (2004, July). GaP: a factor model for discrete data. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 122-129).
Gopalan, P., Hofman, J. M., & Blei, D. M. (2013). Scalable recommendation with Poisson factorization. arXiv preprint arXiv:1311.1704.
Gopalan, P. K., Charlin, L., & Blei, D. (2014). Content-based recommendations with Poisson factorization. In Advances in Neural Information Processing Systems (pp. 3176-3184).

__ Reproducibility__: Yes

__ Additional Feedback__: Minor suggestions/typos:
-In the introduction, prior work is sometimes referred to in the present tense. E.g. "variational autoencoder (VAE) [21, 22] is extended" - "the variational autoencoder ... was extended" would be better. Similarly in several other sentences.
Pg 2, "researches" - "research"
Pg 6, "the remain documents", "jewis"
Pg 7, "balance" - "balances", "whose performance are evaluated", "like the SEAL constructs" - "which constructs", "the G2G assumes" - "the G2G which assumes", "likelyhood", "into loss function" - "into the loss function"
Pg 8, "randomly select 25 nodes" - "randomly selected 25 nodes"
In Figure 3, it would be more informative to compare these results to a single layer graph Poisson factor model. E.g., can this simpler model adequately capture the adjacency matrix, and is it as interpretable as the proposed model?
-----
Post-rebuttal comments: The authors answered my main concern which was the need to provide hard numbers for the memory benefits of the proposed approach. It is important to include these numbers in the revision, as well as the other promised changes. My score will remain unchanged.

__ Summary and Contributions__: The paper introduces Graph Poisson-Gamma Belief networks and Weibull Graph Autoencoders to use for relational (graph) data. The proposed three-layer model(s) is better, or on par, with current state-of-the-art performance in multiple numbers of prediction tasks, such as node classification and link prediction.

__ Strengths__: The paper's main benefits are that the new model performs at the same level or outperforms current state-of-the-art problems and is, in many parts, also interpretable. The model also enables the use of the \beta hyperparameter to shift the focus of the model from links (adjacency matrix) to the nodes (textual data). The authors show the problem that models can focus too much on one part, reducing the performance in the other. I also liked the BerPo-link discussion/contribution.

__ Weaknesses__: The main weakness of the paper is the originality of the article. The proposed model is an extension of previously known models to the situation with graph data.

__ Correctness__: The paper seems to be correct in general. I could not spot any direct errors when reading the paper and the supplementary material.

__ Clarity__: The paper is clear, concise, and it is easy to follow. The only limitation is that some parts of the papers are difficult to read due to tiny fonts (e.g., Figure 2 and 4). A similar problem can be found in the appendix.

__ Relation to Prior Work__: Yes, the related work seems to be complete and is discussed in relation to the work.

__ Reproducibility__: Yes

__ Additional Feedback__: The authors propose a Gibbs sampling algorithm that is mentioned to be very efficient. I would expect the parameters to be very correlated, especially in a three-layer model. Could the authors elaborate on this, efficient in what sense? I assume the Gibbs sampler is rather used as a stochastic optimization algorithm than a way to explore the whole posterior?
The link activation variable u_k is essentially a variable that will work on the topic level to give strength to individual topics for the links. This variable seems a little bit limiting to me. Have the authors considered a matrix for u instead with u_{k_j,k_i} to better capture interaction effects between different topics? It seems quite straightforward to me and would not increase the number of parameters much?
I also miss a discussion on the authors' conclusions about why the Weibull encoder improves so much upon just using the original model with the Gibbs sampler. Since the improvements are quite significant, I think it is essential to discuss why we see the improvement. Where does this additional performance come from?
#### AFTER REBUTTAL ####
I would like to thank the authors for the rebuttal. I think it is quite clear that the chain is not mixing well (during these 2000 iterations). This is clear in the left plot in Figure 2. Although, I’m not sure this actually matters much as the authors are using the Gibbs sampler - the performance is still good. I think, if the authors are accepted, they should try to clarify what they mean with an efficient Gibbs sampler in the light of Figure 2.
I also think the discussion in the rebuttal on why the improvement using the Weibull encoder improves performance. I reread section 5.3 and I do not see that as clear as in the authors’ rebuttal.

__ Summary and Contributions__: This paper first proposed Graph Poisson Factor Analysis (GPFA), a relational topic model, to analyze a collection of interconnected documents.This paper also provide a closed form Gibbs sampling approach to approximate the posteriors. Moreover, the paper also proposed GPGBN to further explore the multilevel semantics, with two Weibull distribution based variational graph auto-encoders for efficient model inference and effective network information aggregation.

__ Strengths__: This paper is well-written and easy to read. The authors provide sufficient details for each model, and the experimental part is also convincing.

__ Weaknesses__: It seems the empirical performance of this work compares only favorable with the other baselines.
====post rebuttal=============
Thanks for the feedback. The rebuttal addresses my concerns and I will not change my score.

__ Correctness__: Might be correct.

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper proposes a new document network model, Graph Poisson Gamma Belief Network. Such a probabilistic model can model the document network well and has the ability to capture uncertainty.

__ Strengths__: The paper resolved an important problem in the text mining area. The authors have proposed a new Weibull graph autoencoder which is innovative. The topic is within the scope of NeurlPS.

__ Weaknesses__: The Graph Poisson Factor Analysis and Graph Poisson Gamma Belief Network are straightforward extensions of Mingyuan Zhou's former works on Poisson Gamma Belief Network (NIPS2015). In the experiments, the GCN and other variants of graph nerual networks should be included to show the performance on network modeling. Since the title and main compartive method are both topic model, it would be better to show the evaluation of underlying topics compared to other topic models apart from the link prediction tasks.

__ Correctness__: The calims and method are correct.

__ Clarity__: The paper is well written and easy to follow. The model propeties well explained.

__ Relation to Prior Work__: Yes, it is clear.

__ Reproducibility__: Yes

__ Additional Feedback__: