__ Summary and Contributions__: The authors of this submission propose a GCN-based method to conduct disentanglement in the graph domain. The proposed method disentangles the input graph into several factor graphs in a graph level, and allows for multiple relations between a pair of entities. This is done by the graph-level attentional aggregation strategy during the training process. A new metric is designed to evaluate the disentanglement performance quantitatively. The proposed method achieves the best disentanglement performance and very competitive performance on the downstream tasks even without the additional edge information.

__ Strengths__: 1)This paper is well written. The proposed method is well motivated and the description is clear.
2)Besides the qualitative evaluation used in the previous methods, this paper provides a legitimate and practical quantitative evaluation protocol to evaluate the performance of the disentanglement in the graph domain, which could be of interest to a large audience.
3)The experimental results on the ZINC dataset are impressive. The proposed method achieves very similar performance as the model that uses the edge information. Such a result aligns well with the claim that the disentangled factor graphs can further boost the performance of the downstream tasks.

__ Weaknesses__: 1) As the first try on the quantitative evaluation of the disentanglement performance in the graph domain, the proposed evaluation protocol is interesting and deserves attention from a larger audience. But I have one concern about the computation of the $GED\_E$. As mentioned in the main paper, computing the full GED between the predicted factor graphs and the ground truth can be too expensive. So $GED\_E$ assumes that the order of nodes is fixed and correct. This would be a problem at least for the synthetic dataset since mixing the factor graphs together may generate some new factor graphs that are not included in the ground truth. Correct me if I am wrong.
2)In Tab. 1, it is great to have the random method to show the lower bound of the evaluation metric, but why only the $GED\_E$ is reported? The std of the random method should also be reported.
3)It would be good to have a visualization of the generated features from the DisenGCN.

__ Correctness__: The claims of the proposed method are evaluated empirically on several datasets include synthetic and real-world ones. The empirical methodology to conduct graph-level disentanglement is reasonable.

__ Clarity__: This paper is well written. The intuition of the proposed method is clearly conveyed. The description of the main method is concise.

__ Relation to Prior Work__: The prior works have been discussed thoroughly and the difference of the proposed method from other methods is clearly discussed

__ Reproducibility__: Yes

__ Additional Feedback__: Please address my concerns in the Weakness Section.
Some Typos,
Line30, CNN-bsed --> CNN-based
line 45, real-word --> real-world
line 152 diverge --> diverse
line 157 take palce --> take place
line 187 sythetic --> synthetic
After-rebuttal
I have read the responses and other reviewers' comments. The authors have addressed most cocerns, so I keep the original score.

__ Summary and Contributions__: The submission proposes a new architecture of graph neural networks. In this new architecture, the model factorizes the graph into several graphs and then put GNNs over these graphs, then the learned representations are concatenated to get the final representation.
Comparing to previous methods, which need graph factors beforehand, this method autometically learns factor graphs from the input graph.
The empirical evaluation indicates that the proposed algorithm improves over several baselines.

__ Strengths__: The strengh of the proposed method is the learnable factorization. This design does not require manually designed graph factors. Furthermore, the graph factor learning also have chances to improve the model performance.
The work is very relevant to the community.

__ Weaknesses__: The expriment of the proposed method is weak. The comparison does not include recent methods on graph classification.
The performance improvements are marginal on real datasets.
Does the method work for node classification? If so, then there should be a comparison with [Liu et al. 2019].

__ Correctness__: The performance comparsions have some flaws. In table 2, the "2" column, all performance numbers are the same but only the proposed method are in bold.
It seems that performance values are bold according to the mean. Do you do paired t-test to compare different algorithms?
The submission claims that the proposed model are able to disentangle graph factors. However, the train procedure encourages the reduction of training error as well as the diversity of factors. Why do these factors need to be consistent with the groundtruth factors?

__ Clarity__: The paper is hard to understand. I expect to see some high-level formulation of the problem and the motivation behind, but the method section starts with detailed introduction of the architecture.
Some notations are not defined. For example, what's readout and f in (2)?
What's the pooling method in graph classification?

__ Relation to Prior Work__: Prior work are discussed.

__ Reproducibility__: No

__ Additional Feedback__:

__ Summary and Contributions__: This paper propose a new graph neural network Factor GCN which first explicitly construct several factor graphs based attention-like mechanism, then perform feature aggregation on each factor graphs separately. The main contribution of the paper is (i) a new GNN framework that can learn disentangled graphs; and (ii) experimental results show the effectiveness of the proposed method for learning disentangled graphs and graph classification

__ Strengths__: 1. The idea is interesting and novel
2. Experimental results show the effectiveness of the proposed method for learning disentangled graphs and graph classification
3. The paper is well written and easy to understand

__ Weaknesses__: 1. The proposed framework is very similar to GAT, except some minor difference in the attention-mechanism and the discriminator. The authors are encouraged to add the discriminator to GAT to show if the designed approach for constructing factor graphs is really more effective than GAT.
2. Some details are missing. For example, in figure 3, are the correlation analysis by averaging all the graphs or only on one graph? In Fig. 4, it is unclear on which dataset are the experiments conducted. When varying lambda,what is the number of factors and when varying number of factors, what is the value of labmda.

__ Correctness__: Yes

__ Clarity__: Generally, the experimental is well written except that some details are missing.

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: Thanks for the rebuttal.

__ Summary and Contributions__: The goal of paper is to find disentangled representation for nodes of a graph. The method named FactorGCN has three steps:
1) Disentangling: Extracting disentangled factor graphs (which are independent as much as possible)
2) Aggregation: Aggregating features of nodes in each factor graph by utilizing GCN layers
3) Merging: Concatenation of features of one node in different factor graphs
Its goal is providing low-dimensional features for nodes of the graph that are disentangled. The final goal is to have better results in the graph analysis tasks (like graph classification).

__ Strengths__: The authors have worked on the experiments to show that the effectiveness of the proposed method. They have also proposed two metrics for evaluating how good the factor graphs are. The introduced synthetic dataset which makes us capable to compare factor graphs with the ground truth ones is also interesting.

__ Weaknesses__: 1) The novelty is limited.
2) Although the paper starts with the claim that their main focus is on the multi-relation disentangling, there is nothing explicit in the method that emphasizes on the multi-relational graphs. It's not also provided explicitly in the experiment part. Can the proposed method give a heterogenous or multi-relational network (for which the type of edges is available and thus can be utilized for evaluation) and distinguish edges of different types by factorizing?
3) It seems that the graph classification loss may not provide enough supervisory signal for disentanglement. Why does enforcing classification of factor graphs make them disentangled necessarily?

__ Correctness__: The authorsâ€™ claims are supported by extensive experimental results.

__ Clarity__: The paper is not well written. Some parts of the paper is not clear. It's not easy to understand the final goal of the paper and what is the superiority of the proposed method to the previous work.

__ Relation to Prior Work__: As said before, the paper starts with the claim that the main focus is on the multi-relational graphs, but none of the previous methods that focus on the multi-relational data is mentioned. It's not clear that the focus of paper is on what task in graph analysis. If the method is not task-specific, it should be tested on different tasks and prove the claim.

__ Reproducibility__: No

__ Additional Feedback__: 1) In Figure 2, the disentangled factor graphs have structures with different number of nodes that is not compatible with the description of method (i.e. "we first convert the continue value in the factor graph to 1/0 value by setting the threshold to make the number of edges in this two graphs are the same").
2) Why did the authors focus on graph classification and not on node classification? There is nothing specific to graph classification, on the other hand, there are some ambiguity in the graph classification.
3) Why hasnâ€™t the proposed method been compared with IPDGN?
4) What are the input features for each dataset, if any?
5) You should compare your method with some graph-classification methods, not just methods for node classification like GCN.
6) Isn't a sigma on e in the right-hand-side of Eq. (3) required?
7) There are many typos in the paper like:
- CNN-bsed -> CNN-based
- continue value -> continuous value
- this two graphs -> these two graphs
- it can in fact performance well -> it can in fact perform well
- only based the its structure
- rate of 0.002 No weight -> rate of 0.002. No weight...