NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6006
Title:Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks

Reviewer 1

* Originality: the method is novel. The idea of sampling based on upper layers is itself interesting. * Quality: the work is based on solid arguments and theoretical proofs. The experiments are thoughtful, although lack analysis. * Clarity: the paper is in general easy to read (thought I didn't check the maths carefully). - Table 3: Isn't that LADIES should be an approximation of original GCN? If that's true, why is Full-Batch outperformed (in terms of F1) by LADIES? * significance: this work will have a high impact because of its simplicity, solidity, and high performances.

Reviewer 2

The topic of the paper is interesting. The paper seems technically sound and generally well-written. However, it is quite difficult to read in my opinion. I have only one suggestion for avoiding confusion. - Similar names can generate several misunderstandings and confusion. The name/title "Layered Importance Sampling" recall me the following Monte Carlo sampling approach, L. Martino, V. Elvira, D. Luengo, J. Corander, "Layered Adaptive Importance Sampling", Statistics and Computing, Volume 27, Issue 3, Pages: 599-623, 2017. I think you should clarify that your framework is different. More generally, "importance sampling" in the Monte Carlo community has a clear different meaning (referring to a specific algorithm).

Reviewer 3

This paper focuses on solving the sparsity issue in the importance sampling of FastGCN and propose a layer-dependent importance sampling schema. However, the main modification is sampling the neighbors based on a defined probability from the union neighborhood of nodes in the upper layer. However, this is a small modification of FastGCN since for a mini-batch training, it is straightforward to improve the sampling efficiency by limiting the candidate nodes to the neighborhood of the nodes in the upper layer. In fact, this is just a trick in the coding implementation of FastGCN in terms of this point claimed by the authors, and cannot be considered as a novel improvement in the level of NeurIPS although you have conducted the corresponding experiments.