Reviews: (Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs

I find the problem to be reasonable well motivated and the work non trivial. Analyzing subgraph counts is usually difficult and this work is no exception. The construction of the family of subgraph is novel and may find applications elsewhere. The paper is well written and the authors do a good job in communicating their ideas in a coherent and understandable fashion. My biggest concern is the disconnect between the theory and experiments. The theory uses a complicated list of motifs whereas the experiments use a set of five subgraphs. The theory uses random graphs whereas one experiment consider graphs from facebook which are very distinct from random graphs in many aspects (power law degree distribution, clustering and so on). I find that the assertion "We believe our techniques are of independent interest and applicability beyond the correlated ER model" to be not convincing with weak evidence. IMHO more work is needed to justify this assertion. It is not clear (to me) whether the hypothesis testing problem has been proposed before. The authors should clarify this in the text. Also for finding the permutation minimizing the error, it is not clear to me that the problem with base graph chosen in a worse-case manner (as opposed to a random graph) is NP-hard. It would be good to provide a proof (or mention this is not known). Some more comments: L4: vision--> computer vision Equation 1: I was not sure what || ||_0 means. I would specify this. L 29: In this paper--> consider deleting L 43-50. The authors repeat the applications of the problems several times in several different places. I would remove some of these. L 54: related works--> related work L 70: By no means easy to establish: I would consider rephrasing. L 72: But have not been able to so. This is a rather limited evidence for the difficulty of the problem. I would remove it. L 78-81 As I said the suggestion of relevance documented by the experimental evidence is weak at best. I think more work is needed to justify these claims. L 82-83: very much: delete L 98: Simple percolation is not very informative. I would add more details. L 155: , this can be used: Sentence needs revision. Beginning of section 2 and elsewhere. If the statement holds for all epsilon then does it hold with probability $1-o(1)$? Theorem 3.1: Specify that $\delta$ is in $(0,1)$. Also does the result hold for $\gamma<<1$ (with worst running time)? Figure 3: text is very small making it very cryptic to read. It would be helpful to have information about the running time of the various algorithms. Different benchmark algorithms where chosen for different simulations (NN for one but not the others). Justification of this or using the same algorithms throughout is recommended. References has multiple typos: It is not sorted alphabetically. [13] should have Erdos Renyi capitalized. [33] g_{n,p} need to be corrected.

This paper studies a matching problem on correlated random graphs. In this model, a Erdos-Renyi graph $Gt$ is first sampled. Then two sugbraphs are sampled from $G$ (with vertex labels shuffled). This paper considers two problem: (i) hypothesis testing: in H_0, two independent Erdos-Renyi graphs are given; in H_1, two graphs are sampled from the correlated random graph model. The goal is to design a randomized algorithm to determine whether we are in H_0 and H_1. (ii) matching: given two graphs sampled from the correlated random graph model, design an algorithm to match the nodes from two subgraphs. I find the problem well motivated. First, this problem is related to the graph isomorphism problem, which is notoriously difficult in the worst case. Thus, it is natural and important to study "average case" behaviors (i.e., design algorithms under random graph models). Second, the authors mentioned a number of settings (i.e., de-anonymize social graphs), in which the new algorithmic techniques developed in the papers can be potentially applicable. The new algorithmic techniques developed by the authors are also very interesting. As mentioned by the authors, "simple statistics" (e.g., degree distribution, eigenvalues/eigenvectors) of a random graph is usually highly concentrated so they are unsuitable for hypothesis testing. The authors propose that they can design a family of subgraphs so that these subgraphs are unlikely to be encountered in a random graph (and even less likely to appear in two independent random graphs). Then by counting the numbers of these subgraphs in the input graphs, we can determine whether H_0 needs to be rejected. The technique can be generalized to design a matching algorithm. Designing the family of subgraphs for counting seems to be highly non-trivial for a number of reasons: (i) the graphs need to be "unnatural" so that they appear with low probability; indeed, the authors find an interesting way to build paths on top of random graphs to make them "unnatural"; and (ii) the size of these subgraphs need to be a constant (to make sure the running time is polynomial). It is usually harder to manipulate constant size random graphs to get "high probability" results. In summary, I find this paper studies a very interesting theoretical model and proposes a new algorithmic technique that can potentially have practical impact. It is a typical theory paper in first tier ML conferences such as Neurips/COLT.

Paper ID:	4930
Title:	(Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs

Reviewer 1

Reviewer 2

Reviewer 3