{"title": "Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching", "book": "Advances in Neural Information Processing Systems", "page_first": 3052, "page_last": 3062, "abstract": "We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis.\nThe proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric on graphs. \nGiven two graphs, the optimal transport associated with their Gromov-Wasserstein discrepancy provides the correspondence between their nodes and achieves graph matching. \nWhen one of the graphs is a predefined graph with isolated but self-connected nodes ($i.e.$, disconnected graph), the optimal transport indicates the clustering structure of the other graph and achieves graph partitioning. \nFurther, we extend our method to multi-graph partitioning and matching by learning a Gromov-Wasserstein barycenter graph for multiple observed graphs. \nOur method combines a recursive $K$-partition mechanism with a warm-start proximal gradient algorithm, whose time complexity is $\\mathcal{O}(K(E+V)\\log_K V)$ for graphs with $V$ nodes and $E$ edges. \nTo our knowledge, our method is the first attempt to make Gromov-Wasserstein discrepancy applicable to large-scale graph analysis and unify graph partitioning and matching into the same framework.\nIt outperforms state-of-the-art graph partitioning and matching methods, achieving a trade-off between accuracy and efficiency.", "full_text": "Scalable Gromov-Wasserstein Learning for\n\nGraph Partitioning and Matching\n\nHongteng Xu1,2\n\nDixin Luo2\n\nLawrence Carin2\n\n1In\ufb01nia ML Inc.\n\n2Duke University\n\n{hongteng.xu, dixin.luo, lcarin}@duke.edu\n\nAbstract\n\nWe propose a scalable Gromov-Wasserstein learning (S-GWL) method and estab-\nlish a novel and theoretically-supported paradigm for large-scale graph analysis.\nThe proposed method is based on the fact that Gromov-Wasserstein discrepancy\nis a pseudometric on graphs. Given two graphs, the optimal transport associated\nwith their Gromov-Wasserstein discrepancy provides the correspondence between\ntheir nodes and achieves graph matching. When one of the graphs has isolated\nbut self-connected nodes (i.e., a disconnected graph), the optimal transport indi-\ncates the clustering structure of the other graph and achieves graph partitioning.\nUsing this concept, we extend our method to multi-graph partitioning and match-\ning by learning a Gromov-Wasserstein barycenter graph for multiple observed\ngraphs; the barycenter graph plays the role of the disconnected graph, and since\nit is learned, so is the clustering. Our method combines a recursive K-partition\nmechanism with a regularized proximal gradient algorithm, whose time complexity\nis O(K(E + V ) logK V ) for graphs with V nodes and E edges. To our knowledge,\nour method is the \ufb01rst attempt to make Gromov-Wasserstein discrepancy applicable\nto large-scale graph analysis and unify graph partitioning and matching into the\nsame framework. It outperforms state-of-the-art graph partitioning and matching\nmethods, achieving a trade-off between accuracy and ef\ufb01ciency.\n\nIntroduction\n\n1\nGromov-Wasserstein distance [42, 29] was originally designed for metric-measure spaces, which can\nmeasure distances between distributions in a relational way, deriving an optimal transport between\nthe samples in distinct spaces. Recently, the work in [11] proved that this distance can be extended to\nGromov-Wasserstein discrepancy (GW discrepancy) [37], which de\ufb01nes a pseudometric for graphs.\nAccordingly, the optimal transport between two graphs indicates the correspondence between their\nnodes. This work theoretically supports the applications of GW discrepancy to structural data analysis,\ne.g., 2D/3D object matching [30, 28, 8], molecule analysis [43, 44], network alignment [49], etc.\nUnfortunately, although GW discrepancy-based methods are attractive theoretically, they are often\ninapplicable to large-scale graphs, because of high computational complexity. Additionally, these\nmethods are designed for two-graph matching, ignoring the potential of GW discrepancy to other\ntasks, like graph partitioning and multi-graph matching. As a result, the partitioning and the matching\nof large-scale graphs still typically rely on heuristic methods [16, 12, 45, 27], whose performance is\noften sub-optimal, especially in noisy cases.\nFocusing on the issues above, we design a scalable Gromov-Wasserstein learning (S-GWL) method\nand establish a new and uni\ufb01ed paradigm for large-scale graph partitioning and matching. As\nillustrated in Figure 1(a), given two graphs, the optimal transport associated with their Gromov-\nWasserstein discrepancy provides the correspondence between their nodes. Similarly, graph partition-\ning corresponds to calculating the Gromov-Wasserstein discrepancy between an observed graph and\na disconnected graph, as shown in Figure 1(b). The optimal transport connects each node of the ob-\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f(a) Graph matching\n\n(c) Multi-graph matching\n\n(b) Graph partitioning\n\n(d) Multi-graph partitioning\n\n(e) Comparisons on accuracy and ef\ufb01ciency\n\nFigure 1: (a)-(d) Illustrations of graph partitioning and matching in the GWL framework. (c, d) The barycenter\ngraph in black and its optimal transports to observed graphs are learned jointly. (d) When the barycenter graph\nis initialized as a graph with few isolated nodes, the optimal transports indicate aligned partitions of observed\ngraph. (e) We test various graph matching methods in 10 trials on an Intel i7 CPU. In each trial, the source graph\nhas 2,000 nodes and the target graph has 100 more noisy nodes and corresponding edges. The graphs yield either\nGaussian partition model [7] or Barab\u00e1si-Albert model [4]. The GWL-based methods (\u2018?\u2019) obtains higher node\ncorrectness than other baselines (\u2018\u2022\u2019), and our S-GWL (big \u2018?\u2019) achieves a trade-off on accuracy and ef\ufb01ciency.\nserved graph with an isolated node of the disconnected graph, yielding a partitioning. In Figures 1(c)\nand 1(d), taking advantage of the Gromov-Wasserstein barycenter in [37], we achieve multi-graph\nmatching and partitioning by learning a \u201cbarycenter graph\u201d. For arbitrary two or more graphs, the\ncorrespondence (or the clustering structure) among their nodes can be established indirectly through\ntheir optimal transports to the barycenter graph.\nThe four tasks in Figures 1(a)-1(d) are explicitly uni\ufb01ed in our Gromov-Wasserstein learning (GWL)\nframework, which corresponds to the same GW discrepancy-based optimization problem. To improve\nits scalability, we introduce a recursive mechanism to the GWL framework, which recursively applies\nK-way partitioning to decompose large graphs into a set of aligned sub-graph pairs, and then matches\neach pair of sub-graphs. When calculating GW discrepancy, we design a regularized proximal\ngradient method, that considers the prior information of nodes and performs updates by solving a\nseries of convex sub-problems. The sparsity of edges further helps us reduce computations. These\nacceleration strategies yield our S-GWL method: for graphs with V nodes and E edges, its time\ncomplexity is O(K(E + V ) logK V ) and memory complexity is O(E + V K). To our knowledge,\nour S-GWL is the \ufb01rst to make GW discrepancy applicable to large-scale graph analysis. Figure 1(e)\nillustrates the effectiveness of S-GWL on graph matching, with more results presented in Section 5.\n2 Graph Analysis Based on Gromov-Wasserstein Learning\nDenote a measure graph as G(V, C, \u00b5), where V = {vi}|V|i=1 is the set of nodes, C = [cij] 2 R|V|\u21e5|V|\nis the adjacency matrix, and \u00b5 = [\u00b5i] 2 \u2303|V| is a Borel probability measure de\ufb01ned on V. The\nadjacency matrix is continuous for weighted graph while binary for unweighted graph. In practice,\n\u00b5 is an empirical distribution of nodes, which can be estimated by a function of node degree. A\nK-way graph partitioning aims to decompose a graph G into K sub-graphs by clustering its nodes,\ni.e., {Gk = G(Vk, Ck, \u00b5k)}K\nk=1, where [kVk = V and Vk \\V k0 = ; for k 6= k0. Given two graphs\nGs and Gt, graph matching aims to \ufb01nd a correspondence between their nodes, i.e., \u21e1 : Vs 7! Vt.\nMany real-world networks are modeled using graph theory, and graph partitioning and matching are\nimportant for community detection [21, 16] and network alignment [39, 40, 54], respectively. In this\nsection, we propose a Gromov-Wasserstein learning framework to unify these two problems.\n2.1 Gromov-Wasserstein discrepancy between graphs\nOur GWL framework is based on a pseudometric on graphs called Gromov-Wasserstein discrepancy:\nDe\ufb01nition 2.1 ([11]). Denote the collection of measure graphs as G. For each p 2 [1,1] and each\nGs, Gt 2G , the Gromov-Wasserstein discrepancy between Gs and Gt is\n(1)\nij  ct\n\ndgw(Gs, Gt) := minT2\u21e7(\u00b5s,\u00b5t)\u21e3Xi,j2Vs Xi0,j02Vt |cs\n\nwhere \u21e7(\u00b5s, \u00b5t) = {T  0|T 1|Vt| = \u00b5s, T >1|Vs| = \u00b5t}.\nGW discrepancy compares graphs in a relational way, measuring how the edges in a graph compare\nto those in the other graph. It is a natural extension of the Gromov-Wasserstein distance de\ufb01ned for\nmetric-measure spaces [29]. We refer the reader to [29, 11, 36] for mathematical foundations.\n\ni0j0|pTii0Tjj0\u2318 1\n\np ,\n\n2\n\n\fi 2V s corresponds to the node vt\n\nGraph matching According to the de\ufb01nition, GW discrepancy measures the distance between two\ngraphs, and the optimal transport T = [Tij] 2 \u21e7(\u00b5s, \u00b5t) is a joint distribution of the graphs\u2019 nodes:\nTij indicates the probability that the node vs\nj 2V t. As shown in\nFigure 1(a), the optimal transport achieves an assignment of the source nodes to the target ones.\nGraph partitioning Besides graph matching, this paradigm is also suitable for graph partitioning.\nRecall that most existing graph partitioning methods obey the modularity maximization principle [16,\n12]: for each partitioned sub-graph, its internal edges should be dense, while its external edges\nconnecting with other sub-graphs should be sparse. This principle implies that if we treat each\nsub-graph as a \u201csuper node\u201d [21, 47, 34], an ideal partitioning should correspond to a disconnected\ngraph with K isolated, but self-connected super nodes. Therefore, we achieve K-way partitioning\nby calculating the GW discrepancy between the observed graph G and a disconnected graph, i.e.,\ndgw(G, Gdc), where Gdc = G(Vdc, diag(\u00b5dc), \u00b5dc). |Vdc| = K. \u00b5dc 2 \u2303K is a node distribution,\nwhose derivation is in Appendix A.1. diag(\u00b5dc) is the adjacency matrix of Gdc. As shown in\nFigure 1(b), the optimal transport is a |V| \u21e5 K matrix. The maximum in each row of the matrix\nindicates the cluster of a node.\n2.2 Gromov-Wasserstein barycenter graph for analysis of multiple graphs\nMulti-graph matching Distinct from most graph matching methods [17, 13, 39, 14], which mainly\nfocus on two-graph matching, our GWL framework can be readily extended to multi-graph cases,\nby introducing the Gromov-Wasserstein barycenter (GWB) proposed in [37]. Given a set of graphs\n{Gm}M\n\nm=1, their p-order Gromov-Wasserstein barycenter is a barycenter graph de\ufb01ned as\n\nG( \u00afV, \u00afC, \u00af\u00b5) := arg min \u00afG XM\n\nm=1\n\n!mdp\n\ngw(Gm, \u00afG),\n\n(2)\n\nwhere ! = [!m] 2 \u2303M contains prede\ufb01ned weights, and \u00afG = G( \u00afV, \u00afC 2 R| \u00afV|\u21e5| \u00afV|, \u00af\u00b5 2 \u2303| \u00afV|) is the\nbarycenter graph with a prede\ufb01ned number of nodes. The barycenter graph minimizes the weighted\naverage of its GW discrepancy to observed graphs. It is an average of the observed graphs aligned by\ntheir optimal transports. The matrix \u00afC is a \u201csoft\u201d adjacency matrix of the barycenter. Its elements\nre\ufb02ect the con\ufb01dence of the edges between the corresponding nodes in \u00afV. As shown in Figure 1(c),\nthe barycenter graph works as a \u201creference\u201d connecting with the observed graphs. For each node\nin the barycenter graph, we can \ufb01nd its matched nodes in different graphs with the help of the\ncorresponding optimal transport. These matched nodes construct a node set, and two arbitrary nodes\nin the set are a correspondence. The collection of all the node sets achieves multi-graph matching.\nMulti-graph partitioning We can also use the barycenter graph to achieve multi-graph partitioning,\nwith the learned barycenter graph playing the role of the aforementioned disconnected graph. Given\ntwo or more graphs, whose nodes may have unobserved correspondences, existing partitioning\nmethods [21, 16, 12, 6, 34] only partition them independently because they are designed for clustering\nnodes in a single graph. As a result, the \ufb01rst cluster of a graph may correspond to the second cluster\nof another graph. Without the correspondence between clusters, we cannot reduce the search space\nin matching tasks. Although this correspondence can be estimated by matching two coarse graphs\nthat treat the clusters as their nodes, this strategy not only introduces additional computations but\nalso leads to more uncertainty on matching, because different graphs are partitioned independently\nwithout leveraging structural information from each other. By learning a barycenter graph for multiple\ngraphs, we can partition them and align their clusters simultaneously. As shown in Figure 1(d), when\napplying K-way multi-graph partitioning, we initialize a disconnected graph with K isolated nodes\ngw(Gm, \u00afG). For each node of the\n\nas the barycenter graph, and then learn it by min \u00afGPM\n\nbarycenter graph, its matched nodes in each observed graph belong to the same cluster.\n3 Scalable Gromov-Wasserstein Learning\nBased on Gromov-Wasserstein discrepancy and the barycenter graph, we have established a GWL\nframework for graph partitioning and matching. To make this framework scalable to large graphs, we\npropose a regularized proximal gradient method to calculate GW discrepancy and integrate multiple\nacceleration strategies to greatly reduce the computational complexity of GWL.\n3.1 Regularized proximal gradient method\nInspired by the work in [48, 49], we calculate the GW discrepancy in (1) based on a proximal gradient\nmethod, which decomposes a complicated non-convex optimization problem into a series of convex\nsub-problems. For simplicity, we set p = 2 in (1, 2). Given two graphs Gs = G(Vs, Cs, \u00b5s) and\n\nm=1 !mdp\n\n3\n\n\f(3)\n\nij  ct\n\nGt = G(Vt, Ct, \u00b5t), in the n-th iteration, we update the current optimal transport T (n) by calculating\ngw(Gs, Gt):\nd2\nT (n+1) = arg minT2\u21e7(\u00b5s,\u00b5t) Xi,j2Vs Xi0,j02Vt |cs\n\ni0j0|2T (n)\n= arg minT2\u21e7(\u00b5s,\u00b5t)hL(Cs, Ct, T (n)), Ti + KL(TkT (n)).\n\nii0 Tjj0 + KL(TkT (n))\n\n|Vt|\nij )  Tij + T (n)\n\nHere, L(Cs, Ct, T ) = Cs\u00b5s1>\n+ 1|Vs|\u00b5>t C>t  2CsT C>t , derived based on [37], and\nh\u00b7,\u00b7i represents the inner product of two matrices. The Kullback-Leibler (KL) divergence, i.e.,\nKL(TkT (n)) =Pij Tij log(Tij/T (n)\n, is added as the proximal term. We can solve\n(3) via the Sinkhorn-Knopp algorithm [41, 15] with nearly-linear convergence [1]. As demonstrated\nin [49], the global convergence of this proximal gradient method is guaranteed, so repeating (3) leads\nto a stable optimal transport, denoted as bT . Additionally, this method is robust to hyperparameter ,\nachieving better convergence and numerical stability than the entropy-based method in [37].\nLearning the barycenter graph is also based on the proximal gradient method. Given M graphs, we\nestimate their barycenter graph via alternating optimization. In the n-th iteration, given the previous\nbarycenter graph \u00afG(n) = G( \u00afV, \u00afC(n), \u00af\u00b5), we update M optimal transports via solving (3). Given the\nupdated optimal transports {T (n+1)\nm=1, we update the adjacency matrix of the barycenter graph by\n}M\n1\n\nm\n\nij\n\n!m(T (n+1)\n\nm\n\n)>CmT (n+1)\n\nm\n\n.\n\n(4)\n\n\u00afC(n+1) =\n\n\u00af\u00b5 \u00af\u00b5>Xm\n\n\u02dc\u00b5 = (n + a)b.\n\n\u00b5 = \u02dc\u00b5/k \u02dc\u00b5k1,\n\nThe weights !, the number of the nodes | \u00afV| and the node distribution \u00af\u00b5 are prede\ufb01ned.\nDifferent from the work in [49, 37], we use the following initialization strategies to achieve a\nregularized proximal gradient method and estimate optimal transports with few iterations.\nNode distributions We estimate the node distribution \u00b5 of a graph empirically by a function of node\ndegree, which re\ufb02ects the local topology of nodes, e.g., the density of neighbors. In particular, for a\ngraph with |V| nodes, we \ufb01rst calculate a vector of node degree, i.e., n = [ni] 2 Z|V|, where ni is\nthe number of neighbors of the i-th node. Then, we estimate the node distribution \u00b5 as\n(5)\nwhere a  0 and b  0 are the hyperparameters controlling the shape of the distribution. For the\ngraphs with isolated nodes, whose ni\u2019s are zeros, we set a > 0 to avoid numerical issues when solving\n(3). For the graphs whose nodes obey to power-law distributions, i.e., Barab\u00e1si-Albert graphs, we\nset b 2 [0, 1) to balance the probabilities of different nodes. This function generalizes the empirical\nsettings used in other methods: when a = 0 and b = 1, we derive the distribution based on the\nnormalized node degree used in [49]; when b = 0, we assume the distribution is uniform as the work\nin [37, 44] does. We \ufb01nd that the node distributions have a huge in\ufb02uence on the stability and the\nperformance of our learning algorithms, which will be discussed in the following sections.\nOptimal transports For graph analysis, we can leverage prior knowledge to get a better regularization\nof optimal transport. Generally, the nodes with similar local topology should be matched with a\nhigh probability. Therefore, given two node distributions \u00b5s and \u00b5t, we construct a node-based\ncost matrix Cnode 2 R|Vs|\u21e5|Vt|, whose element is cij = |\u00b5s\nj|, and add a regularization term\nhCnode, T (n)i to (3). As a result, in the learning phase, we replace the L(Cs, Ct, T (n)) in (3) with\nL(Cs, Ct, T (n)) + \u2327 Cnode, where \u2327 controls the signi\ufb01cance of Cnode. Introducing the proposed\nregularizer helps us measure the similarity between nodes directly, which extends our GW discrepancy\nto the fused GW discrepancy in [44, 43]. In such a situation, the main difference here is that we\nuse the proximal gradient method to calculate the discrepancy, rather than the conditional gradient\nmethod in [43].\nBarycenter graphs When learning GWB, the work in [37] \ufb01xed the node distribution to be uniform\nIn practice, however, both the node distribution of the barycenter graph and its optimal transports\nto observed graphs are unknown. In such a situation, we need to \ufb01rst estimate the node distribution\n\u00af\u00b5 = [\u00af\u00b51, ..., \u00af\u00b5| \u00afV|]. Without loss of generality, we assume that the node distribution of the barycenter\ngraph is sorted, i.e., \u00af\u00b51  ...  \u00af\u00b5| \u00afV|. We estimate the node distribution via the weighted average of\nthe sorted and re-sampled node distributions of observed graphs:\n\ni  \u00b5t\n\n\u00af\u00b5 = XM\n\nm=1\n\n!minterpolate\n\n| \u00afV|(sort(\u00b5m)),\n\n(6)\n\n4\n\n\fm=1,, | \u00afV|, !)\n\nT (n+1)\n\nm\n\n= ProxGrad(Gm, \u00afG(n), ).\n\nm for m = 1, .., M.\n\nj|.\ni  \u00b5t\n\nCalculate \u00afC(n+1) via (4).\nn = n + 1.\n\n8: Output: bTm = T (n)\n\nAlgorithm 2 GWB({Gm}M\n1: Set n = 0.\n2: Initialize \u00af\u00b5 via (6). \u00afC(n) = diag( \u00af\u00b5).\n3: While not converge\nFor m = 1, ..., M\n4:\n5:\n6:\n7:\n\nAlgorithm 1 ProxGrad(Gs, Gt, )\n1: Set n = 0, a = \u00b5s.\n2: Calculate Cnode with cij = |\u00b5s\n3: Initialize T (n) = \u00b5s\u00b5>t .\n4: While not converge\n5: G = e(Cnode+L(Cs,Ct,T (n)))/  T (n).\n6: b = \u00b5t/(G>a), and a = \u00b5s/(Gb).\n7: T (n+1) = diag(a)Gdiag(b), then n = n + 1.\n8: Output: bT = T (n).\nwhere sort(\u00b7) sorts the elements of the input vector in descending order, and interpolate\n| \u00afV|(\u00b7) samples\n| \u00afV| values from the input vector via bilinear interpolation. Given the node distribution, we initialize\nthe optimal transports via the method mentioned above.\nAlgorithms 1 and 2 show the details of our method, where \u201c\u201d and \u201c\u00b7/\u00b7\u201d represent elementwise\nmultiplication and division, respectively. The GWL framework for the tasks in Figures 1(a)-1(d) are\nimplemented based on these two algorithms, with details in Appendix A.1.\n3.2 A recursive K-partition mechanism for large-scale graph matching\nAssume that the observed graphs have comparable size, whose number of nodes and edges are\ndenoted as V and E, respectively. When using the proximal gradient method directly to calculate the\nGW discrepancy between two graphs, the time complexity, in the worst case, is O(V 3) because the\nL(Cs, Ct, T (n)) in (3) involves CsT C>t . Even if we consider the sparsity of edges and implement\nsparse matrix multiplications, the time complexity is still as high as O(EV ).\nTo improve the scalability of our GWL framework, we introduce a recursive K-partition mechanism,\nrecursively decomposing observed large graphs to a set of aligned small graphs. As shown in\nFigure 2(a), given two graphs, we \ufb01rst calculate their barycenter graph (with K nodes) and achieve\ntheir joint K-way partitioning. For each node of the barycenter graph, the corresponding sub-\ngraphs extracted from the observed two graphs construct an aligned sub-graph pair, shown as the\ndotted frames connected with grey circles in Figure 2(a). For each aligned sub-graph pair, we\nfurther calculate its barycenter graph and decompose the pair into more and smaller sub-graph pairs.\nRepeating the above step, we \ufb01nally calculate the GW discrepancy between the sub-graphs in each\npair, and \ufb01nd the correspondence between their nodes. Note that this recursive mechanism is also\napplicable to multi-graph matching: for multiple graphs, in the \ufb01nal step we calculate the GWB\namong the sub-graphs in each set. The details of our S-GWL method are provided in Appendix A.2.\nComplexity analysis In Table 1, we compare the time and memory complexity of our S-GWL method\nwith other matching methods. The Hungarian algorithm [24] has time complexity O(V 3) [17, 33, 50].\nDenoting the largest node degree in a graph as d, the time complexity of GHOST [35] is O(d4).\nThe methods above take the graph af\ufb01nity matrix as input, so their memory complexity in the worst\ncase is O(V 4). MI-GRAAL [23], HubAlign [19] and NETAL [32] are relatively ef\ufb01cient, with time\ncomplexity O(V E + V 2 log V ), O(V 2 log V ) and O(E2 + EV log V ), respectively. CPD+Emb\n\ufb01rst learns D-dimensional node embeddings [18], and then registers the embeddings by the CPD\nmethod [31], whose time complexity is O(DV 2). The memory complexity of these four methods\nis O(V 2). For GW discrepancy-based methods, the GWL+Emb in [49] achieves graph matching\nand node embedding jointly. It uses the distance matrix of node embeddings and breaks the sparsity\nof edges, so its time complexity is O(V 3) and memory complexity is O(V 2). The time complexity\nof GWL is O(V E), but its memory complexity is still O(V 2) because the L(Cs, Ct, T (n)) in (3)\nis a dense matrix. Our S-GWL combines the recursive mechanism with the regularized proximal\ngradient method and implements the CsT (n)C>t\nin (3) by sparse matrix multiplications. Ideally, we\ncan apply R = blogK V c recursions. In the r-th recursion we calculate Kr barycenter graphs for Kr\nsub-graph pairs. The sub-graphs in each pair have O( V\nProposition 3.1. Suppose that we have M graphs, each of which has V nodes and E edges. With\nthe help of the recursive K-partition mechanism, the time complexity of our S-GWL method is\nO(M K(E + V ) logK V ), and its memory complexity is O(M (E + V K)).\nChoosing K = 2 and ignoring the number of graphs, we obtain the complexity shown in Table 1.\nOur S-GWL has lower computational time complexity and memory requirements than many existing\n\nKr ) nodes. As a result, we have\n\n5\n\n\fGWL\n\nGWL\n\nGWL\n\nGWL\nS-GWL (K2 R3)\nS-GWL (K4 R2)\nS-GWL (K8 R1)\n\n(a) Scheme of our S-GWL method\n\nFigure 2: (a) An illustration of S-GWL. (b) Comparisons on runtime.\n\n(b) Runtime\n\nTable 1: Comparisons for graph matching methods on time and memory complexity.\n\nHungarian GHOST\u21e4\n\nMI-GRAAL\n\nHubAlign\n\nNETAL\n\nCPD+Emb GWL+Emb GWL\n\nS-GWL\n\nTime O(\u00b7)\nd4\nMemory O(\u00b7)\nV 4\n* d is the largest node degree in a graph.\n\nV 3\nV 4\n\nV E+V 2 log V V 2 log V E2+EV log V DV 2\nV 2\n\nV 2\n\nV 2\n\nV 2\n\nV 3\nV 2\n\nV E 2(E+V ) log V\nV 2\n\nE + 2V\n\nmethods. Figure 2(b) visualizes the runtime of GWL and S-GWL on matching synthetic graphs. The\nS-GWL methods with different con\ufb01gurations (i.e., the number of partitions K and that of recursions\nR) are consistently faster than GWL. More detailed analysis is provided in Appendix A.3.\n4 Related Work\nGromov-Wasserstein learning GW discrepancy has been applied in many matching problems,\ne.g., registering 3D objects [28, 29] and matching vocabulary sets between different languages [2].\nFocusing on graphs, a fused Gromov-Wasserstein distance is proposed in [44, 43], combining GW\ndiscrepancy with Wasserstein discrepancy [46]. The work in [49] further takes node embedding\ninto account, learning the GW discrepancy between two graphs and their node embeddings jointly.\nThe appropriateness of these methods is supported by [11], which proves that GW discrepancy is a\npseudometric on measure graphs. Recently, an adversarial learning method based on GW discrepancy\nis proposed in [9], which jointly trains two generative models in incomparable spaces. The work\nin [37] further proposes Gromov-Wasserstein barycenters for clustering distributions and interpolating\nshapes. Currently, GW discrepancy is mainly calculated based on Sinkhorn iterations [41, 15, 5, 37],\nwhose applications to large-scale graphs are challenging because of its high complexity. Our S-GWL\nmethod is the \ufb01rst attempt to make GW discrepancy applicable to large-scale graph analysis.\nGraph partitioning and graph matching Graph partitioning is important for community detection\nin networks. Many graph partitioning methods have been proposed, such as Metis [21], EdgeBe-\ntweenness [16], FastGreedy [12], Label Propagation [38], Louvain [6] and Fluid Community [34].\nAll of these methods explore the clustering structure of nodes heuristically based on the modularity-\nmaximization principle [16, 12]. Graph matching is important for network alignment [39, 40, 54]\nand 2D/3D object registration [31, 51, 20, 53]. Traditional methods formulate graph matching as a\nquadratic assignment problem (QAP) and solve it based on the Hungarian algorithm [17, 33, 51, 50],\nwhich are only applicable to small graphs. For large graphs like protein networks, many heuristic\nmethods have been proposed, such as GRAAL [22], IsoRank [40], PISwap [10], MAGNA++ [45],\nNETAL [32], HubAlign [19], and GHOST [35], which mainly focus on two-graph matching and are\nsensitive to the noise in graphs. With the help of GW discrepancy, our work establishes a uni\ufb01ed\nframework for graph partitioning and matching, that can be readily extended to multi-graph cases.\n5 Experiments\nThe implementation of our S-GWL method can be found at https://github.com/HongtengXu/s-gwl.\nWe compare it with state-of-the-art methods for graph partitioning and matching. All the methods are\nrun on an Intel i7 CPU with 4GB memory. Implementation details and a further set of experimental\nresults are provided in Appendix B.\n5.1 Graph partitioning\nWe \ufb01rst verify the performance of the GWL framework on graph partitioning, comparing it with the\nfollowing four baselines: Metis [21], FastGreedy [12], Louvain [6], and Fluid Community [34].\nWe consider synthetic and real-world data. Similar to [52], we compare these methods in terms of\nadjusted mutual information (AMI) and runtime. Each synthetic graph is a Gaussian random partition\ngraph with N nodes and K clusters. The size of each cluster is drawn from a normal distribution\nN (200, 10). The nodes are connected within clusters with probability pin and between clusters with\nprobability pout. The ratio pout\nindicates the clearness of the clustering structure, and accordingly\npin\n\n6\n\n\fTable 2: Comparisons for graph partitioning methods on AMI, time complexity and runtime (second).\n\nMetis\n\nMethod\n\nLouvain\nTime complexity O(V +E+K log K) O(V E log V ) O(V log V )\nAMI Time\n(N, pin, pout)\n0.747 22.889\n(4000, 0.2, 0.05)\n0.574 95.114\n(4000, 0.2, 0.1)\n0.005 290.846\n(4000, 0.2, 0.15)\n\nAMI Time\n0.247 55.435\n0.064 65.441\n0.002 80.322\n\nAMI\n0.413\n0.009\n0.002\n\nTime\n1.744\n2.340\n3.592\n\nFastGreedy\n\nFluid\nO(E)\n\nAMI Time\n0.776 21.580\n0.577 111.043\n0.005 203.225\n\nGWL\n\nO((E + V )K)\nAMI\nTime\n0.812\n13.033\n0.590\n12.740\n0.012\n12.901\n\nTable 3: Comparisons for graph partitioning methods on AMI.\n\nMethod\nDataset\nEU-Email\n\nIndian-Village\n\nMetis\n\nFastGreedy\n\nLouvain\n\nRaw Noisy Raw Noisy Raw Noisy Raw Noisy Raw Noisy\n0.349\n0.421\n0.664\n0.834\n\n0.272 \u2014 0.338\n0.633 \u2014 0.401\n\n0.312\n0.882\n\n0.459\n0.857\n\n0.118\n0.275\n\n0.434\n0.880\n\n0.246\n0.513\n\nFluid\n\nGWL\n\n\u201c\u2014\u201d: Fluid is inapplicable when the networks have disconnected nodes or sub-graphs.\n\nthe dif\ufb01culty of partitioning. We set N = 4000, pin = 0.2, and pout 2{ 0.05, 0.1, 0.15}. Under\neach con\ufb01guration (N, pin, pout), we simulate 10 graphs. For each method, its average performance\non these 10 graphs is listed in Table 2. GWL outperforms the alternatives consistently on AMI.\nAdditionally, as shown in Table 2, GWL has time complexity comparable to other methods, especially\nwhen the graph is sparse, e.g., E = O(V log V ). According to the runtime in practice, GWL is faster\nthan most baselines except Metis, likely because Metis is implemented in the C language while GWL\nand other methods are based on Python.\nTable 3 lists the performance of different methods on two real-world datasets. The \ufb01rst dataset is the\nemail network from a large European research institution [25]. The network contains 1,005 nodes\nand 25,571 edges. The edge (vi, vj) in the network mean that person vi sent person vj at least one\nemail, and each node in the network belongs to exactly one of 42 departments at the research institute.\nThe second dataset is the interactions among 1,991 villagers in 12 Indian villages [3]. Furthermore,\nto verify the robustness of GWL to noise, we not only consider the raw data of these two datasets\nbut also create their noisy version by adding 10% more noisy edges between different communities\n(i.e., departments and villages). Experimental results show that GWL is at least comparable to its\ncompetitors on raw data, and it is more robust to noise than other methods.\n\n5.2 Graph matching\nFor two-graph matching, we compare our S-GWL method with the following baselines: PISwap [10],\nGHOST [35], MI-GRAAL [23], MAGNA++ [45], HubAlign [19], NETAL [32], CPD+Emb [18,\n31], the GWL framework based on Algorithm 1, and the GWL+Emb in [49]. We test all methods\non both synthetic and real-world data. For each method, given the learned correspondence set S and\nthe ground-truth correspondence set Sreal, we calculate node correctness as NC = |S \\ Sreal|/|S|\u21e5\n100%. The runtime of each method is recorded as well.\nIn the synthetic dataset, each source graph G(Vs,Es) obeys a Gaussian random partition model [7] or\nBarab\u00e1si-Albert model [4]. For each source graph, we generate a target graph by adding |Vs|\u21e5 q%\nnoisy nodes and |Es|\u21e5 q% noisy edges to the source graph. Figure 1(e) compares our S-GWL with the\nbaselines when |Vs| = 2000 and q = 5. For each method, its average node correctness and runtime\non matching 10 synthetic graph pairs are plotted. Compared with existing heursitic methods, GW\ndiscrepancy-based methods (GWL+Emb, GWL and S-GWL) obtain much higher node correctness.\nGWL+Emb achieves the highest node correctness, with runtime comparable to many baselines. Our\nGWL framework does not learn node embeddings when matching graphs, so it is slightly worse\nthan GWL+Emb on node correctness but achieves about 10 times acceleration. Our S-GWL method\nfurther accelerates GWL with the help of the recursive mechanism. It obtains high node correctness\nand makes its runtime comparable to the fastest methods (HubAlign and NETAL).\nIn addition to graph matching on synthetic data, we also consider two real-world matching tasks. The\n\ufb01rst task is matching the protein-protein interaction (PPI) network of yeast with its noisy version.\nThe PPI network of yeast contains 1,004 proteins and their 4,920 high-con\ufb01dence interactions.\nIts noisy version contains q% more low-con\ufb01dence interactions, and q 2{ 5, 10, 15, 20, 25}. The\ndataset is available on https://www3.nd.edu/~cone/MAGNA++/. The second task is matching user\naccounts in different communication networks. The dataset is available on http://vacommunity.org/\nVAST+Challenge+2018+MC3, which records the communications among a company\u2019s employees.\nFollowing the work in [49], we extract 622 employees and their call-network and email-network.\n\n7\n\n\fTable 4: Comparisons for graph matching methods on node correctness (%) and runtime (second).\n\nDataset\nMethod\nPISwap\nGHOST\n\nYeast 5% noise Yeast 15% noise Yeast 25% noise MC3 sparse\nTime\nNC\n10.27\n0.10\n17.86\n11.06\nMI-GRAAL 18.03\n72.89\n425.16\n48.13\nMAGNA++\n2.11\n50.00\nHubAlign\n1.23\n6.87\nNETAL\n87.54\n3.59\nCPD+Emb\nGWL+Emb\n83.66\n608.76\n89.43\n82.37\n81.08\n8.39\n\nTime\n22.09\n35.54\n240.03\n624.17\n3.89\n2.09\n108.62\n1537.93\n210.86\n74.64\n\nTime\n15.80\n25.67\n189.21\n603.29\n3.27\n1.91\n103.22\n1340.58\n190.97\n68.58\n\nTime\n18.31\n30.22\n202.77\n630.60\n3.50\n2.06\n110.19\n1499.20\n212.16\n70.06\n\nNC\n0.10\n0.40\n6.87\n25.04\n35.16\n0.90\n2.09\n66.63\n65.34\n61.85\n\nNC\n0.00\n0.30\n5.18\n13.61\n12.85\n1.00\n2.00\n57.97\n58.76\n56.27\n\nNC\n6.32\n21.27\n35.53\n7.88\n36.21\n36.87\n4.35\n40.45\n34.21\n36.92\n\nGWL\nS-GWL\n\nMC3 dense\nTime\nNC\n11.81\n0.00\n22.90\n0.03\n0.64\n197.65\n447.86\n0.09\n2.29\n3.86\n1.30\n1.77\n95.68\n0.48\n4.23\n831.80\n93.94\n3.96\n4.03\n9.01\n\nTable 5: Comparisons for multi-graph matching methods on yeast networks.\n6 graphs\n\n3 graphs\n\n4 graphs\n\n5 graphs\n\nMethod\n\nMultiAlign\n\nGWL\nS-GWL\n\nNC@1 NC@all NC@1 NC@all NC@1 NC@all NC@1 NC@all\n62.97\n63.84\n60.06\n\n45.19\n46.22\n43.33\n\n\u2014\n68.73\n68.53\n\n\u2014\n39.14\n38.45\n\n\u2014\n71.61\n73.21\n\n\u2014\n31.57\n33.27\n\n\u2014\n76.49\n76.99\n\n\u2014\n28.39\n29.68\n\nFor each communication network, we construct a dense version and a sparse one: the dense version\nkeeps all the communications (edges) among the employees, while the sparse version only preserves\nthe communications happening more than 8 times. We test different methods on i) matching yeast\u2019s\nPPI network with its 5%, 15% and 25% noisy versions; and ii) matching the employee call-network\nwith their email-network in both sparse and dense cases. Table 4 shows the performance of various\nmethods in these two tasks. Similar to the experiments on synthetic data, the GW discrepancy-based\nmethods outperform other methods on node correctness, especially for highly-noisy graphs, and our\nS-GWL method achieves a good trade-off between accuracy and ef\ufb01ciency.\nGiven the PPI network of yeast and its 5 noisy versions, we test GWL and S-GWL for multi-graph\nmatching. We consider several existing multi-graph matching methods and \ufb01nd that the methods\nin [33, 51, 50] are not applicable for the graphs with hundreds of nodes because i) their time\ncomplexity is at least O(V 3), and ii) they suffer from inadequate memory on our machine (with 4GB\nmemory) because their memory complexity in the worst case is O(V 4). The IsoRankN in [26] can\nalign multiple PPI networks jointly, but it needs con\ufb01dence scores of protein pairs as input, which are\nnot available for our dataset. The only applicable baseline we are aware of is the MultiAlign in [54].\nHowever, it can only achieve three-graph matching. Table 5 lists the performance of various methods.\nGiven learned correspondence sets, each of which is a set of matched nodes from different graphs,\nNC@1 represents the percentage of the set containing at least a pair of correctly-matched nodes, and\nNC@all represents the percentage of the set in which arbitrary two nodes are matched correctly. Both\nGWL and S-GWL obtain comparable performance to MultiAlign on three-graph matching, and GWL\nis the best. When the number of graphs increases, NC@1 increases while NC@all decreases for all\nthe methods, and S-GWL becomes even better than GWL.\n\n6 Conclusion and Future Work\nWe have developed a scalable Gromov-Wasserstein learning method, achieving large-scale graph\npartitioning and matching in a uni\ufb01ed framework, with theoretical support. Experiments show that our\napproach outperforms state-of-the-art methods in many situations. However, it should be noted that\nour S-GWL method is sensitive to its hyperparameters. Speci\ufb01cally, we observed in our experiments\nthat the  in (3) should be set carefully according to observed graphs. Generally, for large-scale graphs\nwe have to use a large  and solve (3) with many iterations. The a and b in (5) are also signi\ufb01cant for\nthe performance of our method. The settings of these hyperparameters and their in\ufb02uences are shown\nin Appendix B. In the future, we will further study the in\ufb02uence of hyperparameters on the rate of\nconvergence and set the hyperparameters adaptively according to observed data. Additionally, our\nS-GWL method can decompose a large graph into many independent small graphs, so we plan to\nfurther accelerate it by parallel processing and/or distributed learning.\nAcknowledgements This research was supported in part by DARPA, DOE, NIH, ONR and NSF. We\nthank Dr. Hongyuan Zha for helpful discussions.\n\n8\n\n\fReferences\n[1] J. Altschuler, J. Weed, and P. Rigollet. Near-linear time approximation algorithms for optimal\ntransport via sinkhorn iteration. In Advances in Neural Information Processing Systems, pages\n1964\u20131974, 2017.\n\n[2] D. Alvarez-Melis and T. Jaakkola. Gromov-wasserstein alignment of word embedding spaces.\nIn Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,\npages 1881\u20131890, 2018.\n\n[3] A. Banerjee, A. G. Chandrasekhar, E. Du\ufb02o, and M. O. Jackson. The diffusion of micro\ufb01nance.\n\nScience, 341(6144):1236498, 2013.\n\n[4] A.-L. Barab\u00e1si et al. Network science. Cambridge university press, 2016.\n[5] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyr\u00e9. Iterative Bregman projections\nfor regularized transportation problems. SIAM Journal on Scienti\ufb01c Computing, 37(2):A1111\u2013\nA1138, 2015.\n\n[6] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities\nin large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008,\n2008.\n\n[7] U. Brandes, M. Gaertler, and D. Wagner. Experiments on graph clustering algorithms. In\n\nEuropean Symposium on Algorithms, pages 568\u2013579. Springer, 2003.\n\n[8] A. M. Bronstein, M. M. Bronstein, R. Kimmel, M. Mahmoudi, and G. Sapiro. A Gromov-\nHausdorff framework with diffusion geometry for topologically-robust non-rigid shape matching.\nInternational Journal of Computer Vision, 89(2-3):266\u2013286, 2010.\n\n[9] C. Bunne, D. Alvarez-Melis, A. Krause, and S. Jegelka. Learning generative models across\n\nincomparable spaces. NeurIPS Workshop on Relational Representation Learning, 2018.\n\n[10] L. Chindelevitch, C.-Y. Ma, C.-S. Liao, and B. Berger. Optimizing a global alignment of protein\n\ninteraction networks. Bioinformatics, 29(21):2765\u20132773, 2013.\n\n[11] S. Chowdhury and F. M\u00e9moli. The Gromov-Wasserstein distance between networks and stable\n\nnetwork invariants. arXiv preprint arXiv:1808.04337, 2018.\n\n[12] A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks.\n\nPhysical review E, 70(6):066111, 2004.\n\n[13] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub) graph isomorphism algorithm\nfor matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n26(10):1367\u20131372, 2004.\n\n[14] T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. In NIPS, pages 313\u2013320, 2007.\n[15] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in\n\nneural information processing systems, pages 2292\u20132300, 2013.\n\n[16] M. Girvan and M. E. Newman. Community structure in social and biological networks.\n\nProceedings of the national academy of sciences, 99(12):7821\u20137826, 2002.\n\n[17] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. IEEE\n\nTransactions on Pattern Analysis and Machine Intelligence, 18(4):377\u2013388, 1996.\n\n[18] A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In KDD, pages\n\n855\u2013864, 2016.\n\n[19] S. Hashemifar and J. Xu. Hubalign: An accurate and ef\ufb01cient method for global alignment of\n\nprotein\u2013protein interaction networks. Bioinformatics, 30(17):i438\u2013i444, 2014.\n\n[20] S.-H. Jun, S. W. Wong, J. Zidek, and A. Bouchard-C\u00f4t\u00e9. Sequential graph matching with\n\nsequential monte carlo. In AISTATS, pages 1075\u20131084, 2017.\n\n[21] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular\n\ngraphs. SIAM Journal on scienti\ufb01c Computing, 20(1):359\u2013392, 1998.\n\n[22] O. Kuchaiev, T. Milenkovi\u00b4c, V. Memi\u0161evi\u00b4c, W. Hayes, and N. Pr\u017eulj. Topological network\nalignment uncovers biological function and phylogeny. Journal of the Royal Society Interface,\npage rsif20100063, 2010.\n\n[23] O. Kuchaiev and N. Pr\u017eulj.\n\nIntegrative network alignment reveals large regions of global\n\nnetwork similarity in yeast and human. Bioinformatics, 27(10):1390\u20131396, 2011.\n\n[24] H. W. Kuhn. The hungarian method for the assignment problem. Naval research logistics\n\n[25] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http:\n\nquarterly, 2(1-2):83\u201397, 1955.\n\n//snap.stanford.edu/data, June 2014.\n\n9\n\n\f[26] C.-S. Liao, K. Lu, M. Baym, R. Singh, and B. Berger. Isorankn: spectral methods for global\n\nalignment of multiple protein networks. Bioinformatics, 25(12):i253\u2013i258, 2009.\n\n[27] N. Malod-Dognin and N. Pr\u017eulj. L-GRAAL: Lagrangian graphlet-based network aligner.\n\nBioinformatics, 31(13):2182\u20132189, 2015.\n\n[28] F. M\u00e9moli. Spectral Gromov-Wasserstein distances for shape matching. In ICCV Workshops,\n\npages 256\u2013263, 2009.\n\n[29] F. M\u00e9moli. Gromov-Wasserstein distances and the metric approach to object matching. Foun-\n\ndations of computational mathematics, 11(4):417\u2013487, 2011.\n\n[30] F. M\u00e9moli and G. Sapiro. Comparing point clouds. In Proceedings of the 2004 Eurograph-\n\nics/ACM SIGGRAPH symposium on Geometry processing, pages 32\u201340, 2004.\n\n[31] A. Myronenko and X. Song. Point set registration: Coherent point drift. IEEE Transactions on\n\nPattern Analysis and Machine Intelligence, 32(12):2262\u20132275, 2010.\n\n[32] B. Neyshabur, A. Khadem, S. Hashemifar, and S. S. Arab. NETAL: A new graph-based method\nfor global alignment of protein\u2013protein interaction networks. Bioinformatics, 29(13):1654\u20131662,\n2013.\n\n[33] D. Pachauri, R. Kondor, and V. Singh. Solving the multi-way matching problem by permutation\nsynchronization. In Advances in neural information processing systems, pages 1860\u20131868,\n2013.\n\n[34] F. Par\u00e9s, D. Garcia-Gasulla, A. Vilalta, J. Moreno, E. Ayguad\u00e9, J. Labarta, U. Cort\u00e9s, and\nT. Suzumura. Fluid communities: A competitive and highly scalable community detection\nalgorithm. Complex Networks & Their Applications VI, pages 229\u2013240, 2018.\n\n[35] R. Patro and C. Kingsford. Global network alignment using multiscale spectral signatures.\n\nBioinformatics, 28(23):3105\u20133114, 2012.\n\n[36] G. Peyr\u00e9, M. Cuturi, et al. Computational optimal transport. Foundations and Trends R in\n[37] G. Peyr\u00e9, M. Cuturi, and J. Solomon. Gromov-wasserstein averaging of kernel and distance\n\nMachine Learning, 11(5-6):355\u2013607, 2019.\n\nmatrices. In International Conference on Machine Learning, pages 2664\u20132672, 2016.\n\n[38] U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community\n\nstructures in large-scale networks. Physical review E, 76(3):036106, 2007.\n\n[39] R. Sharan and T. Ideker. Modeling cellular machinery through biological network comparison.\n\nNature biotechnology, 24(4):427, 2006.\n\n[40] R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with\napplication to functional orthology detection. Proceedings of the National Academy of Sciences,\n2008.\n\n[41] R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matrices.\n\nPaci\ufb01c Journal of Mathematics, 21(2):343\u2013348, 1967.\n\n[42] K.-T. Sturm et al. On the geometry of metric measure spaces. Acta mathematica, 196(1):65\u2013131,\n\n2006.\n\n2008.\n\n[43] T. Vayer, L. Chapel, R. Flamary, R. Tavenard, and N. Courty. Fused Gromov-Wasserstein\ndistance for structured objects: theoretical foundations and mathematical properties. arXiv\npreprint arXiv:1811.02834, 2018.\n\n[44] T. Vayer, L. Chapel, R. Flamary, R. Tavenard, and N. Courty. Optimal transport for structured\n\ndata. arXiv preprint arXiv:1805.09114, 2018.\n\n[45] V. Vijayan, V. Saraph, and T. Milenkovi\u00b4c. MAGNA++: Maximizing accuracy in global network\n\nalignment via both node and edge conservation. Bioinformatics, 31(14):2409\u20132411, 2015.\n\n[46] C. Villani. Optimal transport: Old and new, volume 338. Springer Science & Business Media,\n\n[47] L. Wang, T. Lou, J. Tang, and J. E. Hopcroft. Detecting community kernels in large social\nnetworks. In 2011 IEEE 11th International Conference on Data Mining, pages 784\u2013793. IEEE,\n2011.\n\n[48] Y. Xie, X. Wang, R. Wang, and H. Zha. A fast proximal point method for Wasserstein distance.\n\narXiv preprint arXiv:1802.04307, 2018.\n\n[49] H. Xu, D. Luo, H. Zha, and L. Carin. Gromov-wasserstein learning for graph matching and\n\nnode embedding. arXiv preprint arXiv:1901.06003, 2019.\n\n[50] J. Yan, J. Wang, H. Zha, X. Yang, and S. Chu. Consistency-driven alternating optimization for\nmultigraph matching: A uni\ufb01ed approach. IEEE Transactions on Image Processing, 24(3):994\u2013\n1009, 2015.\n\n10\n\n\f[51] J. Yan, H. Xu, H. Zha, X. Yang, H. Liu, and S. Chu. A matrix decomposition perspective to\n\nmultiple graph matching. In ICCV, pages 199\u2013207, 2015.\n\n[52] Z. Yang, R. Algesheimer, and C. J. Tessone. A comparative analysis of community detection\n\nalgorithms on arti\ufb01cial networks. Scienti\ufb01c reports, 6:30750, 2016.\n\n[53] T. Yu, J. Yan, Y. Wang, W. Liu, et al. Generalizing graph matching beyond quadratic assignment\n\nmodel. In NIPS, pages 861\u2013871, 2018.\n\n[54] J. Zhang and S. Y. Philip. Multiple anonymized social networks alignment. In ICDM, pages\n\n599\u2013608, 2015.\n\n11\n\n\f", "award": [], "sourceid": 1742, "authors": [{"given_name": "Hongteng", "family_name": "Xu", "institution": "Infinia ML and Duke University"}, {"given_name": "Dixin", "family_name": "Luo", "institution": "Duke University"}, {"given_name": "Lawrence", "family_name": "Carin", "institution": "Duke University"}]}