__ Summary and Contributions__: ===================== POST-REBUTTAL RESPONSE ===================
The authors promised to address most of the concerns in my review.
I am mostly concerned about their statement that SUB-GNN "proposes a new field of study: subgraph classification" since Meng et al, 2018 had already studied the same task. I will agree that this "is the first method that works with varying subgraph sizes", and "can easily deal with large subgraphs". You don't need to claim to be the first on everything, it weakens the other claims in your paper.
It is OK to have an engineering advance on an important topic. A practical approach being grounded in network science theory is not equal to making a theoretical contribution on our understanding of subgraph classification using neural networks.
The Monte Carlo procedure uses sampled lighthouses but it is not averaging these samples. Odd. The authors would likely see performance improvements with an averaging.
All said, I think the paper should be accepted as long as the authors make the changes they have promised and add some discussion about the point I raised.
================================================
This work considers a graph representation learning method that, unlike GNNs, can give embeddings to subgraphs rather than just nodes. While not the first work on the topic, it is the first to give a method that work with varying subgraph sizes. It can also easily deal with large subgraphs.
The method is a great engineering effort, drawing ideas from multiple sources. It is a bit of a kitchen sink approach, with little theory guiding the decisions.

__ Strengths__:
Theoretical grounding: This is an engineering effort that draws inspiration from a multitude of methods. Some of the decisions are arbitrary, but it does not detract from the engineering feat.
Significance and novelty of the contribution: First method that can give subgraph representations to large variable-size subgraphs. It is not the first method designed to give representations to subgraphs (see Meng et al., 2018).
Empirical evaluation: The work shows a lot of interesting tasks. Nice job.

__ Weaknesses__: - It is not clear how the method can be inductive (applied to new graphs). It relies heavily on the subgraph anchor points. Are the embeddings averaged over all possible sampled anchor points? It is also unclear how the subgraph anchor points can be chosen in graphs of different sizes. In some of the smaller graphs, how do we ensure the anchor points do not overlap? And if they overlap, how well is the method supposed to work?
- The work offers very little in terms of theory. A lot of decisions are made but it is hard to grasp why all these decisions came together into a single method.
- What is about random walks that is so fundamental for the subgraph representation learning?
- What is the effect of having node representations trained with a GNN *(GIN)?
- Why not use a direct measure of graph similarity (e.g., edit distance) rather than triangle-biased *random walks* and degree sequences? And why Dynamic Time Warping as the base metric of these random-walk-chosen degree sequences?
- What is the effect of the anchor point sampling procedure?
- The method has a very large number of hyperparameters (the many subgraph choosing methods, the random walk parameters, the similarity measure, etc.). It is hard to think that one can devise a methodology to automatically choose most of these hyperparameter without facing serious problems with family-wise error rates.
- The description of the experiments offers little assurance that there is no leakage of the test data into the training data, since the train and test subgraphs can overlap. "[The] subgraphs are divided into 80/10/10 split." is not reassuring to someone who is worried about leakage.

__ Correctness__: - The paper claims that permutation invariance of h_{X,c} removes position-awareness. It can't really remove all position information. This statement needs to be more precise.

__ Clarity__: The description of the method is hard to follow. In part because of typos and in part due to the notation, but....
- Most importantly, the only complete description of the method appears in the Appendix, in form of Algorithm 1. I was not able to understand the approach just reading the main paper (not even roughly).
After spending a lot of time in the Appendix, I was able to grasp how it works.
Main paper typos:
The notation has bugs and is hard to follow:
- S^{(C)} should be S^{(c)}? If MSG_X is not a function of C and A_x^{(q)}, how can we build H_{X,c} from it?
- \cdot is used for matrix multiplication but maybe also Hadamard product?
- \{{\bf z}_c \forall c \inS\} => is missing a comma: \{{\bf z}_c, \forall c \inS\}
Appendix typos:
Algorithm 1: S^{(C)} for c=1,...,R should be S^{(c)} ??
Algorithm 2: For m = 1, ..., M should be For w = 1,...,M ??

__ Relation to Prior Work__: There is some prior work whose citation is missing. Specifically, this is not the first graph representation learning work to consider subgraph tasks. The earliest work I am aware of was (Meng et al., 2018), which I think only dealt with (small) fixed-size subgraphs. The approach was interpretable, which I thought was interesting at the time.
Meng, C., Mouli, S.C., Ribeiro, B. and Neville, J., 2018, April. Subgraph pattern neural networks for high-order graph evolution prediction. In Thirty-Second AAAI Conference on Artificial Intelligence.

__ Reproducibility__: No

__ Additional Feedback__: I don't think there is much to fix with respect to my other comments, except:
1. Convince the reader that there is no leakage between test and train/validation.
2. How should all these hyperparameter be chosen without high family-wise error rates?

__ Summary and Contributions__: The paper proposes to consider subgraph classification as a new task for graph neural networks and proposes a model tailored to this task, as well as corresponding datasets.
In a nutshell, the model is GNN that uses standard message passing in three channels (also, factors). Messages are passed within each subgraph and between a subgraph and sampled anchor nodes from the graph. The idea is interesting and definitely relevant to NeurIPS. Overall, the paper is well written, but it is very dense -- since the model proposed by the authors is rather involved and all the general explanations of the new tasks and datasets take space. Nevertheless, I think, the authors have done a great job and the paper provides useful contributions for the NeurIPS community. Overall, I think the submissions provides enough contribution to be accepted. Details are listed below.
================
I acknowledge the rebuttal. Since the other reviewers seem to have similar problems with readability and understanding, and pointed out several further errors, I decreased my scoring.

__ Strengths__: - propose a new field of study: subgraph classification
- the model itself is interesting
- propose corresponding baselines and created datasets

__ Weaknesses__: In my opinion, there are just smaller issues regarding readability, and minor clarifications are needed (mentioned below).

__ Correctness__: To me it seems so.

__ Clarity__: Generally yes. There is room of improvement regarding readability, I mention some points in the feedback section.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: - Related Work:
Is covered broadly. Since the method of the authors heavily relies on the three factors they propose, it would make sense to mention alternative factor representations of nodes/graphs in the literature.
- Approach:
The overall approach is interesting and makes sense to me. Yet, this section is very dense and many parts are confusing at first read. Maybe the authors could try to ease reading visually by using an enumeration in each paragraph in Sec. 4.2, with one item per channel.
- Experiments:
* The considered baselines are reasonable and cover a wide range of alternative methods.
* Notably, the authors propose a set of datasets for subgraph classification.
* The HPO tasks look slightly artificial to me. Although this is ok since you can use them for benchmarking, I would be interested if the authors know about if this kind of classification is actually done in practice, or if there's a need for such approaches.
* The results show that the proposed model clearly outperforms existing methods and thus presents a good start in exploring subgraph classification as a new area.
------------------------------------------
Some minor issues:
* l.180 add a \mid or comma after z_c
* Table 1 is not self-explaining, e.g., "Identity of S’s internal nodes" and "Identity of S’s border nodes" are not directly clear.
* l.258 "subgraphs are defined by the Biological Process Ontology from MSigDB"
-> I do not understand what exactly is a subgraph in this setting.
* Similarly, the authors could give an idea how the EM User graph looks like. Since it is no standard KG/domain used in the GNN area, it is hard to imagine how the subgraphs are connected if you do not know it.
* The authors refer to the appendix for details about the datasets. I would suggest to add to the paper at least a few short notes what is specific/different for each. For example, for standard graphs, we have that KGs are rather sparse, biomedical graphs are rather dense, etc. Do the proposed datasets show and/or differ in such characteristics too?
* Bibliography contains references to arxiv papers, while the papers actually have been published at official venues, e.g., [50], [51]

__ Summary and Contributions__: I acknowledge the rebuttal. The reviewers provided more information about efficiency in the paper. I raised my vote to accept.
==========================================================
The authors proposed a subgraph neural network to learn disentangled subgraph representations. Three channels are designed to capture different aspects of subgraph topology. Experiments results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed model in subgraph classification.

__ Strengths__: 1) The authors studied a relevant problem and the proposed approach is novel and looks reasonable.
2) The experimental results in both synthetic and real-world datasets show the effectiveness of the proposed model.
3) The source code is provided to make the work reproducible.

__ Weaknesses__: My only concern lies in efficiency. Although the authors mentioned precomputed similarities can speed up the optimization, subgraph similarity calculation is time-consuming. Also it would be interesting to show the speedup of the precomputing strategy empirically. Besides, it would be useful to report not only the computational complexity but also the running time.

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The reviewers answered my main complaint: They have run the baseline I had in mind but had incorrectly labeled it in the paper. I am raising my vote to accept.
-----------
Presents a graph neural network that accepts a graph, and a "query" (subset of the powerset of the graph), and returns one graph label for each subgraph in the query.
The writing does not say it in this concise way above, does not consider simple baselines (run GCN on each subgraph), and the overall system has many components, functions and heuristics, making it an implementation paper, more than a general advancement.

__ Strengths__: * Point-out a missing gap in literature (subgraph prediction); though trivial baselines not evaluated (see Weakness #2).
* Apply on science datasets (biology and chemistry)

__ Weaknesses__: * Why is baseline (7) "pretrained"? To me, that is the most straight-forward implementation (treat each graph like a subgraph) and run it through GNN (e.g. GCN of Kipf & Welling)
* The function that actually does the random walk (biased random walk) is a heuristic. It needs to be separately tuned.
* The writing is too verbose and hard to follow. It looks like an implementation paper and not-so-much a theoretical nor a general advancement. This is probably the biggest weakness IMO. I suggest you focus on a main idea or two and show analyses why they work, rather than a well-engineered system with many functions and things to be designed/considered.

__ Correctness__: Equation 2 is hard to interpret because subscript "c" on the left is not used on the right.
Typos:
Line 42: "our region" --> "one region" ?
Line 76: Grammar mistake
Line 118: Add $\in \mathcal{S}$ before the period for exactness.
Other typos: Line 131

__ Clarity__: The paper is too verbose and the story-line is not smooth to follow. Math is often incomplete (why not specify domains and ranges for all functions?) and definitions are not exact.

__ Relation to Prior Work__: Yes! I think they covered all relevant aspects and the related work section is very well written, thank you!

__ Reproducibility__: No

__ Additional Feedback__: Reproducibility: Since the model is very complicated, I dont think I am able to implement their model by reading their paper (For reference, I have well-cited ML-graph papers at ICML&NeurIPS)
If other reviewers disagree: e.g. a student in their lab can implement and get the results, from this state of the writing, then I am willing to change my mind about this.