NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Update after author response The authors have addressed my concerns regarding learning the costs, and unsupervised costs associated with degrees. I'd like to see these formulations in the final version. The paper's contribution includes both the usage of OT for graph compression and the specific optimization based on Boolean relaxation. For the latter, it may be beneficial to compare with a greedy Frank-Wolfe approach to handle both the support cardinality constraint and the simplex constraint. The revision could comment on the feasibility of this including a full line search and the fully-corrective variant of Frank-Wolfe. _______________________ Overall this is a good paper which solves an interesting problem in a novel but principled way, and along the way solves a general subproblem. It seems to be an original work, with clear significance to certain forms of network analysis. With regard to downstream tasks, there seems to be a heuristic step to set the costs appropriately to preserve the information. Can a more explicit linkage to the costs be made if the training labels are available? Or is this simply not feasible. In some respects, the experiments are a bit of a straw man since the cost function is defined in supervised terms of node labels, which I assume are related to the graph label. In practice the user is left with this choice. Although learning the cost is mentioned (Lines 202–207), how would this be done in practice (as an outer loop), would it scale, would it overfit? Is there a way to relate the optimal transport to other compression techniques if the costs are defined in a manner that does not consider extrinsic information, say using only the node degree, path length, etc. Minor points for clarity: Clarify what is meant by features of a graph at line 15. Line 45 "0" -> "zero" Line 48, the assumptions are not clear in context. Line 62, Clarify that mixed graphs contain both directed and undirected edges. It would be useful to have the baseline performance measure of the uncompressed graph in Table 1. Line 181, Clearly explain the relationship between node labels and graph labels in the experimental data section. Explain a bit more in text the Laplacian quadratic reference at line 22. Remove parentheses around bracketed references, especially in introduction.
Reviewer 2
This paper proposes a new approach for graph compression by optimal transport (OT). Specifically, it adds a l_0 norm constraint to the LP programming induced by OT and relaxes it to the projection into a simplex. The final problem can be solved by a customized mirror prox method. Empirical studies on are also shown. Following are my concerns: 1. Theorem 4 presents a condition under which the relaxation can recover the optimal solution of the original problem. It seems better to give some more discussions, .e.g., the mildness of the requirements. 2. Figure 1 claims the high efficiency of the proposed method, but the experiment settings and the implementation details are missing, so whether the comparisons are fair is not clear.
Reviewer 3
I regard the contribution as novel and exciting. First of all, from the methodological perspective. I appreciate the clarity and high technical quality of the submission. Nevertheless, I have a few technical questions/concerns: 1. Experimental setup, and Table 1. All experiments were provided for sufficiently small and very sparse graphs. For better understanding the value and the limits of the approach, it would be great to have more experiments over larger datasets. If larger datasets are available, do you have experiments with a higher compression ratio? What are the results? 2. For DHFR dataset, the results of different methods are almost the same. Do you have any insight regarding that? Is there anything unique in this dataset? Update after the authors' response: Many thanks to the authors for addressing my concerns. I have no more concerns regarding the paper.