{"title": "vGraph: A Generative Model for Joint Community Detection and Node Representation Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 514, "page_last": 524, "abstract": "This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs respectively. In existing literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm for the optimization through backpropagation, which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.", "full_text": "vGraph: A Generative Model for Joint Community\n\nDetection and Node Representation Learning\n\nFan-Yun Sun1,2, Meng Qu2, Jordan Hoffmann2,3, Chin-Wei Huang2,4, Jian Tang2,5,6\n\n1National Taiwan University,\n\n2Mila-Quebec Institute for Learning Algorithms, Canada\n\n3Harvard University, USA\n\n4Element AI, Canada\n\n5HEC Montreal, Canada\n\n6CIFAR AI Research Chair\nb04902045@ntu.edu.tw\n\nAbstract\n\nThis paper focuses on two fundamental tasks of graph analysis: community detec-\ntion and node representation learning, which capture the global and local structures\nof graphs, respectively. In the current literature, these two tasks are usually in-\ndependently studied while they are actually highly correlated. We propose a\nprobabilistic generative model called vGraph to learn community membership and\nnode representation collaboratively. Speci\ufb01cally, we assume that each node can\nbe represented as a mixture of communities, and each community is de\ufb01ned as a\nmultinomial distribution over nodes. Both the mixing coef\ufb01cients and the commu-\nnity distribution are parameterized by the low-dimensional representations of the\nnodes and communities. We designed an effective variational inference algorithm\nwhich regularizes the community membership of neighboring nodes to be similar\nin the latent space. Experimental results on multiple real-world graphs show that\nvGraph is very effective in both community detection and node representation\nlearning, outperforming many competitive baselines in both tasks. We show that\nthe framework of vGraph is quite \ufb02exible and can be easily extended to detect\nhierarchical communities.\n\n1\n\nIntroduction\n\nGraphs, or networks, are a general and \ufb02exible data structure to encode complex relationships among\nobjects. Examples of real-world graphs include social networks, airline networks, protein-protein\ninteraction networks, and traf\ufb01c networks. Recently, there has been increasing interest from both\nacademic and industrial communities in analyzing graphical data. Examples span a variety of domains\nand applications such as node classi\ufb01cation [3, 26] and link prediction [8, 32] in social networks, role\nprediction in protein-protein interaction networks [16], and prediction of information diffusion in\nsocial and citation networks [22].\n\nOne fundamental task of graph analysis is community detection, which aims to cluster nodes into\nmultiple groups called communities. Each community is a set of nodes that are more closely\nconnected to each other than to nodes in different communities. A community level description is\nable to capture important information about a graph\u2019s global structure. Such a description is useful\nin many real-world applications, such as identifying users with similar interests in social networks\n[22] or proteins with similar functionality in biochemical networks [16]. Community detection has\nbeen extensively studied in the literature, and a number of methods have been proposed, including\nalgorithmic approaches [1, 5] and probabilistic models [10, 20, 36, 37]. A classical approach to\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fdetect communities is spectral clustering [34], which assumes that neighboring nodes tend to belong\nto the same communities and detects communities by \ufb01nding the eigenvectors of the graph Laplacian.\n\nAnother important task of graph analysis is node representation learning, where nodes are described\nusing low-dimensional features. Node representations effectively capture local graph structure and\nare often used as features for many prediction tasks. Modern methods for learning node embeddings\n[11, 24, 26] have proved effective on a variety of tasks such as node classi\ufb01cation [3, 26], link\nprediction [8, 32] and graph visualization [27, 31].\n\nClustering, which captures the global structure of graphs, and learning node embeddings, which\ncaptures local structure, are typically studied separately. Clustering is often used for exploratory\nanalysis, while generating node embeddings is often done for predictive analysis. However, these\ntwo tasks are very correlated and it may be bene\ufb01cial to perform both tasks simultaneously. The\nintuition is that (1) node representations can be used as good features for community detection (e.g.,\nthrough K-means) [4, 25, 29], and (2) the node community membership can provide good contexts\nfor learning node representations [33]. However, how to leverage the relatedness of node clustering\nand node embedding in a uni\ufb01ed framework for joint community detection and node representation\nlearning is under-explored.\n\nIn this paper, we propose a novel probabilistic generative model called vGraph for joint community\ndetection and node representation learning. vGraph assumes that each node v can be represented as a\nmixture of multiple communities and is described by a multinomial distribution over communities\nz, i.e., p(z|v). Meanwhile, each community z is modeled as a distribution over the nodes v, i.e.,\np(v|z). vGraph models the process of generating the neighbors for each node. Given a node u, we\n\ufb01rst draw a community assignment z from p(z|u). This indicates which community the node is going\nto interact with. Given the community assignment z, we generate an edge (u, v) by drawing another\nnode v according to the community distribution p(v|z). Both the distributions p(z|v) and p(v|z) are\nparameterized by the low-dimensional representations of the nodes and communities. As a result, this\napproach allows the node representations and the communities to interact in a mutually bene\ufb01cial way.\nWe also design a very effective algorithm for inference with backpropagation. We use variational\ninference for maximizing the lower-bound of the data likelihood. The Gumbel-Softmax [13] trick\nis leveraged since the community membership variables are discrete. Inspired by existing spectral\nclustering methods [6], we added a smoothness regularization term to the objective function of the\nvariational inference routine to ensure that community membership of neighboring nodes is similar.\nThe whole framework of vGraph is very \ufb02exible and general. We also show that it can be easily\nextended to detect hierarchical communities.\n\nIn the experiment section, we show results on three tasks: overlapping community detection, non-\noverlapping community detection, and node classi\ufb01cation\u2013 all using various real-world datasets. Our\nresults show that vGraph is very competitive with existing state-of-the-art approaches for these tasks.\nWe also present results on hierarchical community detection. Relevant source codes have been made\npublic 1.\n\n2 Related Work\n\nCommunity Detection. Many community detection methods are based on matrix factorization\ntechniques. Typically, these methods try to recover the node-community af\ufb01liation matrix by\nperforming a low-rank decomposition of the graph adjacency matrix or other related matrices\n[17, 18, 32, 36]. These methods are not scalable due to the complexity of matrix factorization,\nand their performance is restricted by the capacity of the bi-linear models. Many other studies\ndevelop generative models for community detection. Their basic idea is to characterize the generation\nprocess of graphs and cast community detection as an inference problem [37, 38, 39]. However, the\ncomputational complexity of these methods is also high due to complicated inference. Compared with\nthese approaches, vGraph is more scalable and can be ef\ufb01ciently optimized with backpropagation and\nGumbel-Softmax [13, 19]. Additionally, vGraph is able to learn and leverage the node representations\nfor community detection.\n\nNode Representation Learning. The goal of node representation learning is to learn distributed\nrepresentations of nodes in graphs so that nodes with similar local connectivity tend to have sim-\n\n1https://github.com/fanyun-sun/vGraph\n\n2\n\n\f(a) vGraph\n\n(b) Hierarchical vGraph\n\nFigure 1: The diagram on the left represents the graphical model of vGraph and the diagram on the\nright represents the graphical model of the hierarchical extension. \u03c6n is the embedding of node wn,\n\u03c8 denotes the embedding of communities, and \u03d5 denotes the embeddings of nodes used in p(c|z).\nRefer to Eq. 2 and Eq. 3.\n\nilar representations. Some representative methods include DeepWalk [24], LINE [26], node2vec\n[11] and GraphRep [3]. Typically, these methods explore the local connectivity of each node by\nconducting random walks with either breadth-\ufb01rst search [24] or depth-\ufb01rst search [26]. Despite\ntheir effectiveness in a variety of applications, these methods mainly focus on preserving the local\nstructure of graphs, therefore ignoring global community information. In vGraph, we address this\nlimitation by treating the community label as a latent variable. This way, the community label can\nprovide additional contextual information which enables the learned node representations to capture\nthe global community information.\n\nFramework for node representation learning and community detection. There exists previous\nwork [4, 14, 29, 30, 33] that attempts to solve community detection and node representation learning\njointly. However, their optimization process alternates between community assignment and node\nrepresentation learning instead of simultaneously solving both tasks [4, 30]. Compared with these\nmethods, vGraph is scalable and the optimization is done end-to-end.\n\nMixture Models. Methodologically, our method is related to mixture models, particularly topic\nmodels (e.g. PSLA [12] and LDA [2]). These methods simulate the generation of words in documents,\nin which topics are treated as latent variables, whereas we consider generating neighbors for each\nnode in a graph, and the community acts as a latent variable. Compared with these methods, vGraph\nparameterizes the distributions with node and community embeddings, and all the parameters are\ntrained with backpropagation.\n\n3 Problem De\ufb01nition\n\nGraphs are ubiquitous in the real-world. Two fundamental tasks on graphs are community detection\nand learning node embeddings, which focus on global and local graph structures respectively and\nhence are naturally complementary. In this paper, we study jointly solving these two tasks. Let\nG = (V, E) represent a graph, where V = {v1, . . . , vV } is a set of vertices and E = {eij} is the set of\nedges. Traditional graph embedding aims to learn a node embedding \u03c6i \u2208 Rd for each vi \u2208 V where\nd is predetermined. Community detection aims to extract the community membership F for each\nnode. Suppose there are K communities on the graph G, we can denote the community assignment\nof node vi as F(vi) \u2286 {1, ..., K}. We aim to jointly learn node embeddings \u03c6 and community\naf\ufb01liation of vertices F .\n\n4 Methodology\n\nIn this section, we introduce our generative approach vGraph, which aims at collaboratively learning\nnode representations and detecting node communities. Our approach assumes that each node can\nbelong to multiple communities representing different social contexts [7]. Each node should generate\ndifferent neighbors under different social contexts. vGraph parameterizes the node-community\ndistributions by introducing node and community embeddings. In this way, the node representations\n\n3\n\n\fcan bene\ufb01t from the detection of node communities. Similarly, the detected community assignment\ncan in turn improve the node representations. Inspired by existing spectral clustering methods [6], we\nadd a smoothness regularization term that encourages linked nodes to be in the same communities.\n\n4.1\n\nvGraph\n\nvGraph models the generation of node neighbors. It assumes that each node can belong to multiple\ncommunities. For each node, different neighbors will be generated depending on the community\ncontext. Based on the above intuition, we introduce a prior distribution p(z|w) for each node w and a\nnode distribution p(c|z) for each community z. The generative process of each edge (w, c) can be\nnaturally characterized as follows: for node w, we \ufb01rst draw a community assignment z \u223c p(z|w),\nrepresenting the social context of w during the generation process. Then, the linked neighbor c is\ngenerated based on the assignment z through c \u223c p(c|z). Formally, this generation process can be\nformulated in a probabilistic way:\n\np(c|w) = X\n\np(c|z)p(z|w).\n\nz\n\n(1)\n\nvGraph parameterizes the distributions p(z|w) and p(c|z) by introducing a set of node embeddings\nand community embeddings. Note that different sets of node embeddings are used to parametrize\nthe two distributions. Speci\ufb01cally, let \u03c6i denote the embedding of node i used in the distribution\np(z|w), \u03d5i denote the embedding of node i used in p(c|z), and \u03c8j denote the embedding of the j-th\ncommunity. The prior distribution p\u03c6,\u03c8(z|w) and the node distribution conditioned on a community\np\u03c8,\u03d5(c|z) are parameterized by two softmax models:\n\np\u03c6,\u03c8(z = j|w) =\n\nexp(\u03c6T\nw\u03c8j)\ni=1 exp(\u03c6T\n\nw\u03c8i)\n\nPK\n\n,\n\np\u03c8,\u03d5(c|z = j) =\n\nexp(\u03c8T\n\nj \u03d5c)\nPc\u2032 \u2208V exp(\u03c8T\n\nj \u03d5c\u2032 )\n\n(2)\n\n(3)\n\n.\n\nCalculating Eq. 3 can be expensive as it requires summation over all vertices. Thus, for large datasets\nwe can employ negative sampling as done in LINE [26] using the following objective function:\n\nlog \u03c3(\u03d5T\n\nc \u00b7 \u03c8j) +\n\nK\n\nX\n\ni=1\n\nEv\u223cPn(v)[log \u03c3(\u2212\u03d5T\n\nv \u00b7 \u03c8j)],\n\n(4)\n\nwhere \u03c3(x) = 1/(1 + exp(\u2212x)), Pn(v) is a noise distribution, and K is the number of negative\nsamples. This, combined with stochastic optimization, enables our model to be scalable.\n\nTo learn the parameters of vGraph, we try to maximize the log-likelihood of the observed edges, i.e.,\nlog p\u03c6,\u03d5,\u03c8(c|w). Since directly optimizing this objective is intractable for large graphs, we instead\noptimize the following evidence lower bound (ELBO) [15]:\n\nL = Ez\u223cq(z|c,w)[log p\u03c8,\u03d5(c|z)] \u2212 KL(q(z|c, w)||p\u03c6,\u03c8(z|w))\n\n(5)\n\nwhere q(z|c, w) is a variational distribution that approximates the true posterior distribution p(z|c, w),\nand KL(\u00b7||\u00b7) represents the Kullback-Leibler divergence between two distributions.\n\nSpeci\ufb01cally, we parametrize the variational distribution q(z|c, w) with a neural network as follows:\n\nq\u03c6,\u03c8(z = j|w, c) =\n\nexp((\u03c6w \u2299 \u03c6c)T \u03c8j)\ni=1 exp((\u03c6w \u2299 \u03c6c)T \u03c8i)\n\nPK\n\n.\n\n(6)\n\nwhere \u2299 denotes element-wise multiplication. We chose element-wise multiplication because it is\nsymmetric and it forces the representation of the edge to be dependent on both nodes.\n\nThe variational distribution q(z|c, w) represents the community membership of the edge (w, c).\nBased on this, we can easily approximate the community membership distribution of each node w,\ni.e., p(z|w) by aggregating all its neighbors:\n\np(z|w) = X\n\np(z, c|w) = X\n\np(z|w, c)p(c|w) \u2248\n\nc\n\nc\n\n4\n\n1\n\n|N (w)| X\n\nc\u2208N (w)\n\nq(z|w, c),\n\n(7)\n\n\fwhere N (w) is the set of neighbors of node w. To infer non-overlapping communities, we can\nsimply take the arg max of p(z|w). However, when detecting overlapping communities instead of\nthresholding p(z|w) as in [14], we use\n\nF(w) = {arg max\n\nk\n\nq(z = k|w, c)}c\u2208N (w).\n\n(8)\n\nThat is, we assign each edge to one community and then map the edge communities to node\ncommunities by gathering nodes incident to all edges within each edge community as in [1].\n\nComplexity. Here we show the complexity of vGraph. Sampling an edge takes constant time, thus\ncalculating Eq. (4) takes O(d(M + 1)) time, where M is the number of negative samples and d is\nthe dimension of embeddings (the node embeddings and community embeddings have the same\ndimension). To calculate Eq. (6), it takes O(dK) time where K is the number of communities.\nThus, an iteration with one sample takes O(max(dM, dK)) time. In practice the number of updates\nrequired is proportional to the number of edges O(|E|), thus the overall time complexity of vGraph is\nO(|E|d max(M, K)).\n\n4.2 Community-smoothness Regularized Optimization\n\nFor optimization, we need to optimize the lower bound (5) w.r.t. the parameters in the variational\ndistribution and the generative parameters. If z is continuous, the reparameterization trick [15] can\nbe used. However, z is discrete in our case. In principle, we can still estimate the gradient using a\nscore function estimator [9, 35]. However, the score function estimator suffers from a high variance,\neven when used with a control variate. Thus, we use the Gumbel-Softmax reparametrization [13, 19]\nto obtain gradients for the evidence lower bound. More speci\ufb01cally, we use the straight-through\nGumbel-Softmax estimator [13].\n\nA community can be de\ufb01ned as a group of nodes that are more similar to each other than to those\noutside the group [23]. For a non-attributed graph, two nodes are similar if they are connected and\nshare similar neighbors. However, vGraph does not explicitly weight local connectivity in this way.\nTo resolve this, inspired by existing spectral clustering studies [6], we augment our training objective\nwith a smoothness regularization term that encourages the learned community distributions of linked\nnodes to be similar. Formally, the regularization term is given below:\n\nLreg = \u03bb X\n\n\u03b1w,c \u00b7 d(p(z|c), p(z|w))\n\n(w,c)\u2208E\n\n(9)\n\nwhere \u03bb is a tunable hyperparameter , \u03b1w,c is a regularization weight, and d(\u00b7, \u00b7) is the distance\nbetween two distributions (squared difference in our experiments). Motivated by [25], we set \u03b1w,c to\nbe the Jaccard\u2019s coef\ufb01cient of node w and c, which is given by:\n\n\u03b1w,c =\n\n|N (w) \u2229 N (c)|\n|N (w) \u222a N (c)|\n\n,\n\n(10)\n\nwhere N (w) denotes the set of neighbors of w. The intuition behind this is that \u03b1w,c serves as a\nsimilarity measure of how similar the neighbors are between two nodes. Jaccard\u2019s coef\ufb01cient is used\nfor this metric and thus the higher the value of Jaccard\u2019s coef\ufb01cient, the more the two nodes are\nencouraged to have similar distribution over communities.\n\nBy combining the evidence lower bound and the smoothness regularization term, the entire loss\nfunction we aim to minimize is given below:\n\nL = \u2212Ez\u223cq\u03c6,\u03c8(z|c,w)[log p\u03c8,\u03d5(c|z)] + KL(q\u03c6,\u03c8(z|c, w)||p\u03c6,\u03c8(z|w)) + Lreg\n\n(11)\n\nFor large datasets, negative sampling can be used for the \ufb01rst term.\n\n4.3 Hierarchical vGraph\n\nOne advantage of vGraph\u2019s framework is that it is very general and can be naturally extended to\ndetect hierarchical communities. In this case, suppose we are given a d-level tree and each node is\nassociate with a community, the community assignment can be represented as a d-dimensional path\nvector ~z = (z(1), z(2), ..., z(d)), as shown in Fig. 1. Then, the generation process is formulated as\n\n5\n\n\fbelow: (1) a tree path ~z is sampled from a prior distribution p\u03c6,\u03c8(~z|w). (2) The context c is decoded\nfrom ~z with p\u03c8,\u03d5(c|~z). Under this model, the likelihood of the network is\n\np\u03c6,\u03d5,\u03c8(c|w) = X\n\np\u03c6,\u03c8(c|~z)p\u03c6,\u03c8(~z|w).\n\n(12)\n\n~z\n\nAt every node of the tree, there is an embedding vector associated with the community. Such a\nmethod is similar to the hierarchical softmax parameterization used in language models [21].\n\n5 Experiments\n\nAs vGraph can detect both overlapping and non-\noverlapping communities, we evaluate it on three tasks:\noverlapping community detection, non-overlapping com-\nmunity detection, and vertex classi\ufb01cation.\n\n5.1 Datasets\n\nWe evaluate vGraph on 20 standard graph datasets. For\nnon-overlapping community detection and node classi-\n\ufb01cation, we use 6 datasets: Citeseer, Cora, Cornell,\nTexas, Washington, and Wisconsin. For overlapping com-\nmuntiy detection, we use 14 datasets, including Facebook,\nYoutube, Amazon, Dblp, Coauthor-CS. For Youtube, Ama-\nzon, and Dblp, we consider subgraphs with the 5 largest\nground-truth communities due to the runtime of baseline\nmethods. To demonstrate the scalability of our method, we\nadditionally include visualization results on a large dataset\n\u2013 Dblp-full. Dataset statistics are provided in Table 1. More\ndetails about the datasets is provided in Appendix A.\n\n5.2 Evaluation Metric\n\nTable 1: Dataset Statistics. |V|: number\nof nodes, |E|: number of edges, K: num-\nber of communities, AS: average size of\ncommunities, AN: average number of\ncommunities that a node belongs to.\n\nDataset\nNonoverlapping\nCornell\nTexas\nWashington\nWisconsin\nCora\nCiteseer\noverlapping\nfacebook0\nfacebook107\nfacebook1684\nfacebook1912\nfacebook3437\nfacebook348\nfacebook3980\nfacebook414\nfacebook686\nfacebook698\nYoutube\nAmazon\nDblp\nCoauthor-CS\nDblp-full\n\n|V|\n\n|E|\n\nK\n\nAS\n\nAN\n\n195\n187\n230\n265\n2708\n3312\n\n333\n1034\n786\n747\n534\n224\n52\n150\n168\n61\n\n5346\n794\n\n24493\n9252\n93432\n\n286\n298\n417\n479\n5278\n4660\n\n2519\n26749\n14024\n30025\n4813\n3192\n146\n1693\n1656\n270\n\n24121\n2109\n89063\n33261\n335520\n\n5\n5\n5\n5\n7\n6\n\n24\n9\n17\n46\n32\n14\n17\n7\n14\n13\n5\n5\n5\n5\n\n5000\n\n39.00\n37.40\n46.00\n53.00\n386.86\n552.00\n\n13.54\n55.67\n45.71\n23.15\n6.00\n40.50\n3.41\n25.43\n34.64\n6.54\n\n1347.80\n277.20\n5161.40\n2920.60\n\n22.45\n\n1\n1\n1\n1\n1\n1\n\n0.98\n0.48\n0.99\n1.43\n0.36\n2.53\n1.12\n1.19\n2.89\n1.39\n1.26\n1.75\n1.05\n1.58\n1.20\n\nFor overlapping community detection, we use F1-Score and Jaccard Similarity to measure the\nperformance of the detected communities as in [37, 18]. For non-overlapping community detection,\nwe use Normalized Mutual Information (NMI) [28] and Modularity. Note that Modularity does not\nutilize ground truth data. For node classi\ufb01cation, Micro-F1 and Macro-F1 are used.\n\n5.3 Comparative Methods\n\nFor overlapping community detection, we choose four competitive baselines: BigCLAM [36], a\nnonnegative matrix factorization approach based on the Bernoulli-Poisson link that only considers\nthe graph structure; CESNA [37], an extension of BigCLAM, that additionally models the generative\nprocess for node attributes; Circles [20], a generative model of edges w.r.t. attribute similarity to\ndetect communities; and SVI [10], a Bayesian model for graphs with overlapping communities that\nuses a mixed-membership stochastic blockmodel.\n\nTo evaluate node embedding and non-overlapping community detection, we compare our method with\nthe \ufb01ve baselines: MF [32], which represents each vertex with a low-dimensional vector obtained\nthrough factoring the adjacency matrix; DeepWalk [24], a method that adopts truncated random walk\nand Skip-Gram to learn vertex embeddings; LINE [26], which aims to preserve the \ufb01rst-order and\nsecond-order proximity among vertices in the graph; Node2vec [11], which adopts biased random\nwalk and Skip-Gram to learn vertex embeddings; and ComE [4], which uses a Gaussian mixture\nmodel to learn an embedding and clustering jointly using random walk features.\n\n5.4 Experiment Con\ufb01guration\n\nFor all baseline methods, we use the implementations provided by their authors and use the default\nparameters. For methods that only output representations of vertices, we apply K-means to the\n\n6\n\n\fTable 2: Evaluation (in terms of F1-Score and Jaccard Similarity) on networks with overlapping\nground-truth communities. NA means the task is not completed in 24 hours. In order to evaluate\nthe effectiveness of smoothness regularization, we show the result of our model with (vGraph+) and\nwithout the regularization.\n\nF1-score\n\nJaccard\n\nDataset\n\nfacebook0\n\nfacebook107\nfacebook1684\nfacebook1912\nfacebook3437\nfacebook348\nfacebook3980\nfacebook414\nfacebook686\nfacebook698\n\nYoutube\nAmazon\n\nDblp\n\nCoauthor-CS\n\nBigclam CESNA Circles\n0.2948\n0.2860\n0.3928\n0.2467\n0.2894\n0.5041\n0.2617\n0.3493\n0.1009\n0.1986\n0.5175\n0.4964\n0.3274\n0.3203\n0.4843\n0.5886\n0.5036\n0.3825\n0.3515\n0.5423\n0.3600\n0.4370\n0.5330\n0.4640\n0.2360\n0.3830\n\n0.2806\n0.3733\n0.5121\n0.3474\n0.2009\n0.5375\n0.3574\n0.6007\n0.3900\n0.5865\n0.3840\n0.4680\n0.3590\n0.4200\n\nNA\nNA\n\nSVI\n\n0.2810\n0.2689\n0.3591\n0.2804\n0.1544\n0.4607\n\nNA\n\n0.3893\n0.4639\n0.4031\n0.4140\n0.4730\n\nNA\n\n0.4070\n\nvGraph\n0.2440\n0.2817\n0.4232\n0.2579\n0.2087\n0.5539\n0.4450\n0.6471\n0.4775\n0.5396\n0.5070\n0.5330\n0.3930\n0.4980\n\nvGraph+\n0.2606\n0.3178\n0.4379\n0.3750\n0.2267\n0.5314\n0.4150\n0.6693\n0.5379\n0.5950\n0.5220\n0.5320\n0.3990\n0.5020\n\nBigclam CESNA Circles\n0.1846\n0.1862\n0.2752\n0.1547\n0.1871\n0.3801\n0.1672\n0.2412\n0.0545\n0.1148\n0.3927\n0.3586\n0.2426\n0.2097\n0.3418\n0.4713\n0.3615\n0.2504\n0.2255\n0.4192\n0.2207\n0.2929\n0.3505\n0.3671\n0.1384\n0.2409\n\n0.1725\n0.2695\n0.3871\n0.2394\n0.1165\n0.4001\n0.2645\n0.4732\n0.2534\n0.4588\n0.2416\n0.3502\n0.2226\n0.2682\n\nNA\nNA\n\nSVI\n\n0.1760\n0.1719\n0.2467\n0.2010\n0.0902\n0.3360\n\nNA\n\n0.2931\n0.3394\n0.3002\n0.2867\n0.3643\n\nNA\n\n0.2972\n\nvGraph\n0.1458\n0.1827\n0.2917\n0.1855\n0.1201\n0.4099\n0.3376\n0.5184\n0.3272\n0.4356\n0.3434\n0.3689\n0.2501\n0.3517\n\nvGraph+\n0.1594\n0.2170\n0.3272\n0.2796\n0.1328\n0.4050\n0.2933\n0.5587\n0.3856\n0.4771\n0.3480\n0.3693\n0.2505\n0.3432\n\nlearned embeddings to get non-overlapping communities. Results report are averaged over 5 runs.\nNo node attributes are used in all our experiments. We generate node attributes using node degree\nfeatures for those methods that require node attributes such as CESNA [37] and Circles [20]. It is\nhard to compare the quality of community results when the numbers of communities are different for\ndifferent methods. Therefore, we set the number of communities to be detected, K, as the number of\nground-truth communities for all methods, as in [18]. For vGraph, we use full-batch training when\nthe dataset is small enough. Otherwise, we use stochastic training with a batch size of 5000 or 10000\nedges. The initial learning rate is set to 0.05 and is decayed by 0.99 after every 100 iterations. We use\nthe Adam optimizer and we trained for 5000 iterations. When smoothness regularization is used, \u03bb is\nset to 100. For community detection, the model with the lowest loss is chosen. For node classi\ufb01cation,\nwe evaluate node embeddings after 1000 iterations of training. The dimension of node embeddings is\nset to 128 in all experiments for all methods. For the node classi\ufb01cation task, we randomly select\n70% of the labels for training and use the rest for testing.\n\n5.5 Results\n\nTable 2 shows the results on overlapping community detection. Some of the methods are not very\nscalable and cannot obtain results in 24 hours on some larger datasets. Compared with these studies,\nvGraph outperforms all baseline methods in 11 out of 14 datasets in terms of F1-score or Jaccard\nSimilarity, as it is able to leverage useful representations at node level. Moreover, vGraph is also very\nef\ufb01cient on these datasets, since we use employ variational inference and parameterize the model\nwith node and community embeddings. By adding the smoothness regularization term (vGraph+),\nwe see a farther increase performance, which shows that our method can be combined with concepts\nfrom traditional community detection methods.\n\nThe results for non-overlapping community detection are presented in Table 3. vGraph outperforms\nall conventional node embeddings + K-Means in 4 out of 6 datasets in terms of NMI and outperforms\nall 6 in terms of modularity. ComE, another framework that jointly solves node embedding and\ncommunity detection, also generally performs better than other node embedding methods + K-Means.\nThis supports our claim that learning these two tasks collaboratively instead of sequentially can\nfurther enhance performance. Compare to ComE, vGraph performs better in 4 out of 6 datasets\nin terms of NMI and 5 out of 6 datasets in terms of modularity. This shows that vGraph can also\noutperform frameworks that learn node representations and communities together.\n\nTable 4 shows the result for the node classi\ufb01cation task. vGraph signi\ufb01cantly outperforms all the\nbaseline methods in 9 out of 12 datasets. The reason is that most baseline methods only consider\nthe local graph information without modeling the global semantics. vGraph solves this problem by\nrepresenting node embeddings as a mixture of communities to incorporate global context.\n\n7\n\n\fTable 3: Evaluation (in terms of NMI and Modularity) on networks with non-overlapping ground-truth\ncommunities.\n\nDataset\ncornell\ntexas\n\nwashington\nwisconsin\n\ncora\n\nciteseer\n\nMF\n\ndeepwalk\n\n0.0632\n0.0562\n0.0599\n0.0530\n0.2673\n0.0552\n\n0.0789\n0.0684\n0.0752\n0.0759\n0.3387\n0.1190\n\nNMI\n\nLINE\n0.0697\n0.1289\n0.0910\n0.0680\n0.2202\n0.0340\n\nModularity\n\nnode2vec ComE\n0.0732\n0.0772\n0.0504\n0.0689\n0.3660\n0.2499\n\n0.0712\n0.0655\n0.0538\n0.0749\n0.3157\n0.1592\n\nvGraph\n0.0803\n0.0809\n0.0649\n0.0852\n0.3445\n0.1030\n\nMF\n\ndeepwalk\n\n0.4220\n0.2835\n0.3679\n0.3892\n0.6711\n0.6963\n\n0.4055\n0.3443\n0.1841\n0.3384\n0.6398\n0.6819\n\nLINE\n0.2372\n0.1921\n0.1655\n0.1651\n0.4832\n0.4014\n\nnode2vec ComE\n0.5748\n0.4856\n0.4862\n0.5500\n0.7010\n0.7324\n\n0.4573\n0.3926\n0.4311\n0.5338\n0.5392\n0.4657\n\nvGraph\n0.5792\n0.4636\n0.5169\n0.5706\n0.7358\n0.7711\n\nTable 4: Results of node classi\ufb01cation on 6 datasets.\nMacro-F1\n\nMicro-F1\n\nDatasets\nCornell\nTexas\n\nWashington\nWisconsin\n\nCora\n\nCiteseer\n\nMF\n13.05\n8.74\n15.88\n14.77\n11.29\n14.59\n\nDeepWalk LINE Node2Vec ComE vGraph\n29.76\n26.00\n30.36\n29.91\n16.23\n17.88\n\n19.86\n15.46\n15.80\n14.63\n12.88\n12.88\n\n20.70\n14.95\n21.23\n18.47\n10.52\n16.68\n\n21.78\n16.33\n13.99\n19.06\n11.86\n15.99\n\n22.69\n21.32\n18.45\n23.44\n13.21\n16.17\n\nMF\n15.25\n14.03\n15.94\n18.75\n12.79\n15.79\n\nDeepWalk LINE Node2Vec ComE vGraph\n37.29\n47.37\n34.78\n35.00\n24.35\n20.42\n\n25.42\n33.33\n33.33\n32.50\n28.04\n19.42\n\n23.73\n27.19\n25.36\n28.12\n14.59\n16.80\n\n33.05\n40.35\n34.06\n38.75\n22.32\n19.01\n\n24.58\n25.44\n28.99\n25.00\n27.74\n20.82\n\n5.6 Visualization\n\nIn order to gain more insight, we present visualizations of the facebook107 dataset in Fig. 2(a). To\ndemonstrate that our model can be applied to large networks, we present results of vGraph on a co-\nauthorship network with around 100,000 nodes and 330,000 edges in Fig. 2(b). More visualizations\nare available in appendix B. We can observe that the community structure, or \u201csocial context\u201d, is\nre\ufb02ected in the corresponding node embedding (node positions in both visualizations are determined\nby t-SNE of the node embeddings). To demonstrate the hierarchical extension of our model, we\nvisualize a subset of the co-authorship dataset in Fig. 3. We visualize the \ufb01rst-tier communities\nand second-tier communities in panel (a) and (b) respectively. We can observe that the second-tier\ncommunities grouped under the same \ufb01rst-tier communities interact more with themselves than they\ndo with other second-tier communities.\n\n6 Conclusion\n\nIn this paper, we proposed vGraph, a method that performs overlapping (and non-overlapping)\ncommunity detection and learns node and community embeddings at the same time. vGraph casts\nthe generation of edges in a graph as an inference problem. To encourage collaborations between\ncommunity detection and node representation learning, we assume that each node can be represented\nby a mixture of communities, and each community is de\ufb01ned as a multinomial distribution over\nnodes. We also design a smoothness regularizer in the latent space to encourage neighboring nodes to\n\n(a)\n\n(b)\n\nFigure 2: In panel (a) we visualize the result on the facebook107 dataset using vGraph. In panel\n(b) we visualize the result on Dblp-full dataset using vGraph. The coordinates of the nodes are\ndetermined by t-SNE of the node embeddings.\n\n8\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 3: We visualize the result on a subset of Dblp dataset using two-level hierarchical vGraph. The\ncoordinates of the nodes are determined by t-SNE of the node embeddings. In panel (a) we visualize\nthe \ufb01rst-tier communities. In panel (b), we visualize the second-tier communities. In panel (c) we\nshow the corresponding hierarchical tree structure.\n\nbe similar. Empirical evaluation on 20 different benchmark datasets demonstrates the effectiveness of\nthe proposed method on both tasks compared to competitive baselines. Furthermore, our model is\nalso readily extendable to detect hierarchical communities.\n\nAcknowledgments\n\nThis project is supported by the Natural Sciences and Engineering Research Council of Canada, as\nwell as the Canada CIFAR AI Chair Program.\n\nReferences\n\n[1] Yong-Yeol Ahn, James P Bagrow, and Sune Lehmann. Link communities reveal multiscale\n\ncomplexity in networks. nature, 466(7307):761, 2010.\n\n[2] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of\n\nmachine Learning research, 3(Jan):993\u20131022, 2003.\n\n[3] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with\nglobal structural information. In Proceedings of the 24th ACM international on conference on\ninformation and knowledge management, pages 891\u2013900. ACM, 2015.\n\n[4] Sandro Cavallari, Vincent W Zheng, Hongyun Cai, Kevin Chen-Chuan Chang, and Erik Cambria.\nLearning community embedding with community detection and node embedding on graphs.\nIn Proceedings of the 2017 ACM on Conference on Information and Knowledge Management,\npages 377\u2013386. ACM, 2017.\n\n[5] Imre Der\u00e9nyi, Gergely Palla, and Tam\u00e1s Vicsek. Clique percolation in random networks.\n\nPhysical review letters, 94(16):160202, 2005.\n\n[6] Xiaowen Dong, Pascal Frossard, Pierre Vandergheynst, and Nikolai Nefedov. Clustering\nwith multi-layer graphs: A spectral perspective. IEEE Transactions on Signal Processing,\n60(11):5820\u20135831, 2012.\n\n[7] Alessandro Epasto and Bryan Perozzi. Is a single embedding enough? learning node representa-\n\ntions that capture multiple social contexts. 2019.\n\n[8] Sheng Gao, Ludovic Denoyer, and Patrick Gallinari. Temporal link prediction by integrating\ncontent and structure information. In Proceedings of the 20th ACM international conference on\nInformation and knowledge management, pages 1169\u20131174. ACM, 2011.\n\n[9] Peter W Glynn. Likelihood ratio gradient estimation for stochastic systems. Communications\n\nof the ACM, 33(10):75\u201384, 1990.\n\n[10] Prem K Gopalan and David M Blei. Ef\ufb01cient discovery of overlapping communities in massive\n\nnetworks. Proceedings of the National Academy of Sciences, 110(36):14534\u201314539, 2013.\n\n9\n\n\f[11] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks.\n\nIn\nProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and\ndata mining, pages 855\u2013864. ACM, 2016.\n\n[12] Thomas Hofmann. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth con-\nference on Uncertainty in arti\ufb01cial intelligence, pages 289\u2013296. Morgan Kaufmann Publishers\nInc., 1999.\n\n[13] Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax.\n\narXiv preprint arXiv:1611.01144, 2016.\n\n[14] Yuting Jia, Qinqin Zhang, Weinan Zhang, and Xinbing Wang. Communitygan: Community\n\ndetection with generative adversarial nets. arXiv preprint arXiv:1901.06631, 2019.\n\n[15] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint\n\narXiv:1312.6114, 2013.\n\n[16] Nevan J Krogan, Gerard Cagney, Haiyuan Yu, Gouqing Zhong, Xinghua Guo, Alexandr\nIgnatchenko, Joyce Li, Shuye Pu, Nira Datta, Aaron P Tikuisis, et al. Global landscape of\nprotein complexes in the yeast saccharomyces cerevisiae. Nature, 440(7084):637, 2006.\n\n[17] Da Kuang, Chris Ding, and Haesun Park. Symmetric nonnegative matrix factorization for graph\nclustering. In Proceedings of the 2012 SIAM international conference on data mining, pages\n106\u2013117. SIAM, 2012.\n\n[18] Ye Li, Chaofeng Sha, Xin Huang, and Yanchun Zhang. Community detection in attributed\ngraphs: an embedding approach. In Thirty-Second AAAI Conference on Arti\ufb01cial Intelligence,\n2018.\n\n[19] Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous\n\nrelaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.\n\n[20] Julian McAuley and Jure Leskovec. Discovering social circles in ego networks. ACM Transac-\n\ntions on Knowledge Discovery from Data (TKDD), 8(1):4, 2014.\n\n[21] Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model.\n\nIn Aistats, volume 5, pages 246\u2013252. Citeseer, 2005.\n\n[22] Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in\n\nnetworks. Physical review E, 69(2):026113, 2004.\n\n[23] Yulong Pei, Nilanjan Chakraborty, and Katia Sycara. Nonnegative matrix tri-factorization with\ngraph regularization for community detection in social networks. In Twenty-Fourth International\nJoint Conference on Arti\ufb01cial Intelligence, 2015.\n\n[24] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social repre-\nsentations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge\ndiscovery and data mining, pages 701\u2013710. ACM, 2014.\n\n[25] Benedek Rozemberczki, Ryan Davies, Rik Sarkar, and Charles Sutton. Gemsec: graph embed-\n\nding with self clustering. arXiv preprint arXiv:1802.03997, 2018.\n\n[26] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-\nscale information network embedding. In Proceedings of the 24th international conference\non world wide web, pages 1067\u20131077. International World Wide Web Conferences Steering\nCommittee, 2015.\n\n[27] Jiliang Tang, Charu Aggarwal, and Huan Liu. Node classi\ufb01cation in signed social networks. In\nProceedings of the 2016 SIAM International Conference on Data Mining, pages 54\u201362. SIAM,\n2016.\n\n[28] Fei Tian, Bin Gao, Qing Cui, Enhong Chen, and Tie-Yan Liu. Learning deep representations\n\nfor graph clustering. In Twenty-Eighth AAAI Conference on Arti\ufb01cial Intelligence, 2014.\n\n10\n\n\f[29] Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel M\u00fcller. Verse: Versatile\ngraph embeddings from similarity measures. In Proceedings of the 2018 World Wide Web\nConference on World Wide Web, pages 539\u2013548. International World Wide Web Conferences\nSteering Committee, 2018.\n\n[30] Cunchao Tu, Xiangkai Zeng, Hao Wang, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun,\nBo Zhang, and Leyu Lin. A uni\ufb01ed framework for community detection and network represen-\ntation learning. IEEE Transactions on Knowledge and Data Engineering, 2018.\n\n[31] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In Proceedings\nof the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining,\npages 1225\u20131234. ACM, 2016.\n\n[32] Fei Wang, Tao Li, Xin Wang, Shenghuo Zhu, and Chris Ding. Community discovery using\nnonnegative matrix factorization. Data Mining and Knowledge Discovery, 22(3):493\u2013521,\n2011.\n\n[33] Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. Community\npreserving network embedding. In Thirty-First AAAI Conference on Arti\ufb01cial Intelligence,\n2017.\n\n[34] Scott White and Padhraic Smyth. A spectral clustering approach to \ufb01nding communities in\ngraphs. In Proceedings of the 2005 SIAM international conference on data mining, pages\n274\u2013285. SIAM, 2005.\n\n[35] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforce-\n\nment learning. Machine learning, 8(3-4):229\u2013256, 1992.\n\n[36] Jaewon Yang and Jure Leskovec. Overlapping community detection at scale: a nonnegative\nmatrix factorization approach. In Proceedings of the sixth ACM international conference on\nWeb search and data mining, pages 587\u2013596. ACM, 2013.\n\n[37] Jaewon Yang, Julian McAuley, and Jure Leskovec. Community detection in networks with node\nattributes. In 2013 IEEE 13th International Conference on Data Mining, pages 1151\u20131156.\nIEEE, 2013.\n\n[38] Hongyi Zhang, Irwin King, and Michael R Lyu. Incorporating implicit link preference into\noverlapping community detection. In Twenty-Ninth AAAI Conference on Arti\ufb01cial Intelligence,\n2015.\n\n[39] Mingyuan Zhou. In\ufb01nite edge partition models for overlapping community detection and link\n\nprediction. In Arti\ufb01cial Intelligence and Statistics, pages 1135\u20131143, 2015.\n\n11\n\n\f", "award": [], "sourceid": 288, "authors": [{"given_name": "Fan-Yun", "family_name": "Sun", "institution": "National Taiwan University"}, {"given_name": "Meng", "family_name": "Qu", "institution": "Mila"}, {"given_name": "Jordan", "family_name": "Hoffmann", "institution": "Harvard University/Mila"}, {"given_name": "Chin-Wei", "family_name": "Huang", "institution": "MILA"}, {"given_name": "Jian", "family_name": "Tang", "institution": "Mila"}]}