__ Summary and Contributions__: In this paper, the authors study the potential of graph neural networks in the problem of substructure counting.
C1. Theoretical results are provided, suggesting the specific classes of graphs whose counting problems can or cannot be addressed by practical implementations of existing GNNs or k-WL variants.
C2. Local Relational Pooling (LRP) is proposed to enhance the capability of counting substructures.
C3. Empirical study on real-life and synthetic datasets suggests the proposed LRP outperforms existing GNNs in multiple substructure counting tasks.

__ Strengths__: S1. This paper analyzes the limitation of existing GNNs in terms of substructure counting from both theoretical and empirical perspectives.
S2. The authors propose a unique angle that enhances GNNs capability in substructure count by improving methods in local information aggregation.
S3. The proposed LRP brings competitive performance in a few counting problems.

__ Weaknesses__: W1. The authors may need to justify the concrete value of solving substructure counting problems by GNNs. Substructure counting is a problem that has been widely discussed in the domain of theory, database, and data mining. Existing combinatorial algorithms enhanced with improved system designs [1] enables accurate counting on large-scale graphs for simple substructures, such as triangle and stars discussed in this work. Compared with existing combinatorial method based solutions, what is the essential gain from using GNNs?
W2. The design of LRP may also need both theoretical and empirical justification, compared with existing ideas. For example, in GraphSage, the idea of using sequential models (e.g., LSTM) to aggregate local information has been proposed. Given the same assumption that one is able to enumerate neighborhood permutation, it is non-trivial why the proposed LRP is better. The authors may provide more evidences to highlight the unique value in the proposed LRP, compared with existing solutions.
W3. The presentation of the empirical results could be improved. In table 1, it is strange to report top and median performance, as the tested input graphs may not be aligned for different techniques. For example, the top 1 tested graph for solution A may not be the top 1 tested graph for solution B. When the input is different, it is confusing to compare their performance. Instead, it could be better to report average performance with mean and standard deviation for a same set of graphs.
Reference
[1] Suri, S et al. Counting Triangles and the Curse of the Last Reducer. WWW 2011.

__ Correctness__: In general, the claims are correct, but the empirical study could be further improved, as suggested in the section of "weakness".

__ Clarity__: Yes

__ Relation to Prior Work__: In general, the related work is well addressed. Meanwhile, the authors may need to justify the unique value in LRP, compared with existing sequence model based neighborhood aggregation methods.

__ Reproducibility__: Yes

__ Additional Feedback__: Could the authors share some insights on why it is a good direction to improve GNN for substructure counting problems? For substructure counting, it is inevitable to deal with the subgraph isomorphism problem, which is NP-hard. In other words, it will be challenging to develop an efficient algorithm for exact counting or approximation with any approximation guarantee for arbitrary graphs and substructure patterns. For practical cases in real-life applications, existing combinatorial methods have made progress to improve scalability on large-scale data. To be concrete, what are the extra dimensions of improvement GNNs or deep learning methods could bring?
======After rebuttal=====
Appreciate the authors' effort on addressing the questions. Indeed, the authors may provide a new perspective on "learning which substructure to count", which could be beyond the scope of existing combinatorial approaches. But still, without concrete evidences, it is a bit hard to envision the extra value of "learning which substructure to count".
- Compared with existing GNNs which implement the concept of "counting" by pooling functions, it is unclear why accurate counting matters in prediction tasks.
- Given the expressive power of GNNs, the impact of "learning which substructure to count" could be also limited.
In sum, the value of "learning to count substructures" remains a bit vague. I keep the score unchanged.

__ Summary and Contributions__: This paper studies the expressive power of graph neural networks (GNNs) through their ability to subgraph-count and induced-subgraph-count. The paper examines MPNN, k-WL, and k-IGN. The theoretical results are dense and solid. Besides, authors derive a Local Relational Pooling (LRP) approach to count substructures in a graph, which is proven to be accurate empirically. Experiments further validate the proposed theories.

__ Strengths__: The idea in this paper is novel. The proposed theories are promising. These can be used to guide future directions in developing GNN architectures. The empirical evaluation is sound. The content is relevant to the NeurIPS community.

__ Weaknesses__: The proposed method LRP is computationally expensive and thus can be limited in practice. See more comments in additional feedback.

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: 1. LRP is computationally expensive. The time complexity, O(n * (D!)^D^l * k^2), greatly depends on the degree of nodes. The paper considers LRP-1-4 (l=1), which makes the computation feasible but could limit the model's power. I am curious if the author managed to apply LRP with l>1 into any other datasets.
2. In (1), how to decide which subtensor C_k to use for node i?
3. In (2), where does the term $MLP(D_i)/|S^BFS_n|$ come from? This term is not shown in (1).
4. How the value bold in table 2? GIN+VN (77.07 +- 1.49) should be comparable to LRP-1-4 (ES) (77.19 +- 1.40), but GIN+VN is not bold.
Thanks for the author response. I still think it's good addition for understanding GNN's capacity, and I'll keep my score.

__ Summary and Contributions__: In this paper the authors make a significant contribution towards understanding the expressive power of graph neural nets on a problem of great interest. A key concept of interest is the Weisfeiler-Lehman hierarchy, initially proposed by the expert on the GI problem Laszlo Babai. In this paper several neat results are proved. First, the authors prove MPNNs and 2nd order Invariant Graph Networks cannot count any connected induced subgraph of 3 or more nodes, but on the positive side they can count subgraphs that are star-shaped. Given the triangle rich structure of real-world networks this shows a severe limitation of these learning models.
They also propose a new architecture that performs really well on real-data.
[Update: I thank the authors for their feedback. My opinion remains the same, i.e., that this is a good paper that sheds light into a challenging problem.]

__ Strengths__: - The paper is theoretically grounded.
- The authors present a series of new important results, that advance the understanding we have of graph neural nets.
- The architecture in section 5 performs really well on real-data
- The experiments are well performed

__ Weaknesses__: The writeup could be improved. Specifically, it was hard to parse some of the contributions in the main text, and some of the key results are stated without any intuitions behind the proofs.

__ Correctness__: Yes, to the best of my knowledge the proofs are correct.

__ Clarity__: Yes.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: I understand that the page limit of 8 pages is limiting for a contribution like this one, but the authors could do a better job in revealing some more ideas behind their key results in the main text.

__ Summary and Contributions__: This paper propose a theoretical framework for studying the ability of GNNs to count attributed substructures based on both function approximation and graph discrimination.

__ Strengths__: - The paper established that neither Message Passing Neural Networks (MPNNs) nor 2nd-order Invariant Graph Networks (2-IGNs) can count subgraphs more than 2 nodes.
- The paper proposed that MPNNs and 2-IGNs can count subgraphs that are star-shaped.
- The paper proposed a GNN model based on [41] called Local Relation Pooling.

__ Weaknesses__: - The strength of the paper lies in the theoretical analysis, however, the findings are already known and not surprising, as it is well-known that GNNs are at most as expressive as WL.
- Evaluation of LRP should be measured using standard benchmark graph datasets (MUTAG, Enzymes).

__ Correctness__: Yes, the claims are correct.

__ Clarity__: The paper is well presented, but should be checked for typos, some examples:
- In line 71, "effectivel graphs"

__ Relation to Prior Work__: Related work is discussed well.

__ Reproducibility__: Yes

__ Additional Feedback__: ============== After Rebuttal ================
Thanks for taking the time to write the rebuttal. I increased my score.