Review for NeurIPS paper: Graph Cross Networks with Vertex Infomax Pooling

NeurIPS 2020

Graph Cross Networks with Vertex Infomax Pooling

Meta Review

The paper make a novel contribution by introducing graph cross networks, and demonstrate it usefulness in practical example. While initial concern related to the clarity of the paper, the reviewers found that the authors have done a good job in summarizing their work and addressed most of their concerns in the rebuttal. The two key components of GXN are a novel vertex infomax pooling, which creates multiscale graphs in a trainable manner and a novel feature crossing layer, enabling feature interchange across scales. This work has been compared their work with prior methods and surpassed all of them, which meets the bar for a NeurIPS presentation. While it does not impact the decision, during the discussion, the following points were left unanswered, and it would be great if the authors could take the following points in their reviews: (1) In VIPool, P_v, P_n, P_{v,n} are all discrete distributions (although the feature vector can be continuous, as there are |V| nodes, the sample from P_v can only have at most |V| values, so it is a discrete distribution). For such discrete distributions, the mutual information can be directly computed according to the definition in O(|V|^2) time without using those neural estimators. Although the neural estimator may give smoother estimation, they also have much larger computational cost, so it seems important to better explain why these neural estimators intuitively much better for this task (2) The authors claimed that the cost of VIPool is O(|V|) in their response. Some people understand that it also imply that VIPool greedily selects |\Omega| nodes, where |\Omega| is proportional to |V|. To select each node, VIPool needs to draw negative samples from the node set, so the cost is O(|V|). Therefore, the total theoretical cost is at least O(|V|^2) rather than O(|V|). (3) To speed up pooling, the authors mentioned that negative sampling was used, but the number of negative samples and the distribution of negative samples were not clearly mentioned, which are important to replicate the results of the paper.