{"title": "Semi-Implicit Graph Variational Auto-Encoders", "book": "Advances in Neural Information Processing Systems", "page_first": 10712, "page_last": 10723, "abstract": "Semi-implicit graph variational auto-encoder (SIG-VAE) is proposed to expand the flexibility of variational graph auto-encoders (VGAE) to model graph data. SIG-VAE employs a hierarchical variational framework to enable neighboring node sharing for better generative modeling of graph dependency structure, together with a Bernoulli-Poisson link decoder. Not only does this hierarchical construction provide a more flexible generative graph model to better capture real-world graph properties, but also does SIG-VAE naturally lead to semi-implicit hierarchical variational inference that allows faithful modeling of implicit posteriors of given graph data, which may exhibit heavy tails, multiple modes, skewness, and rich dependency structures. SIG-VAE integrates a carefully designed generative model, well suited to model real-world sparse graphs, and a sophisticated variational inference network, which propagates the graph structural information and distribution uncertainty to capture complex posteriors. SIG-VAE clearly outperforms a simple combination of VGAE with variational inference, including semi-implicit variational inference~(SIVI) or normalizing flow (NF), which does not propagate uncertainty in its inference network, and provides more interpretable latent representations than VGAE does. Extensive experiments with a variety of graph data show that SIG-VAE significantly outperforms state-of-the-art methods on several different graph analytic tasks.", "full_text": "Semi-Implicit Graph Variational Auto-Encoders\n\nArman Hasanzadeh\u2020\u2217, Ehsan Hajiramezanali\u2020\u2217, Nick Duf\ufb01eld\u2020, Krishna Narayanan\u2020,\n\nMingyuan Zhou\u2021, Xiaoning Qian\u2020\n\n\u2020 Department of Electrical and Computer Engineering, Texas A&M University\n\n{armanihm, ehsanr, duffieldng, krn, xqian}@tamu.edu\n\u2021 McCombs School of Business, The University of Texas at Austin\n\nmingyuan.zhou@mccombs.utexas.edu\n\nAbstract\n\nSemi-implicit graph variational auto-encoder (SIG-VAE) is proposed to expand\nthe \ufb02exibility of variational graph auto-encoders (VGAE) to model graph data.\nSIG-VAE employs a hierarchical variational framework to enable neighboring node\nsharing for better generative modeling of graph dependency structure, together\nwith a Bernoulli-Poisson link decoder. Not only does this hierarchical construction\nprovide a more \ufb02exible generative graph model to better capture real-world graph\nproperties, but also does SIG-VAE naturally lead to semi-implicit hierarchical\nvariational inference that allows faithful modeling of implicit posteriors of given\ngraph data, which may exhibit heavy tails, multiple modes, skewness, and rich\ndependency structures. SIG-VAE integrates a carefully designed generative model,\nwell suited to model real-world sparse graphs, and a sophisticated variational\ninference network, which propagates the graph structural information and distri-\nbution uncertainty to capture complex posteriors. SIG-VAE clearly outperforms a\nsimple combination of VGAE with variational inference, including semi-implicit\nvariational inference (SIVI) or normalizing \ufb02ow (NF), which does not propagate\nuncertainty in its inference network, and provides more interpretable latent repre-\nsentations than VGAE does. Extensive experiments with a variety of graph data\nshow that SIG-VAE signi\ufb01cantly outperforms state-of-the-art methods on several\ndifferent graph analytic tasks.\n\n1\n\nIntroduction\n\nAnalyzing graph data is an important machine learning task with a wide variety of applications.\nTransportation networks, social networks, gene co-expression networks, and recommendation systems\nare a few example datasets that can be modeled as graphs, where each node represents an agent (e.g.,\nroad intersection, person, and gene) and the edges manifest the interactions between the agents. The\nmain challenge for analyzing graph datasets for link prediction, clustering, or node classi\ufb01cation,\nis how to deploy graph structural information in the model. Graph representation learning aims to\nsummarize the graph structural information by a feature vector in a low-dimensional latent space,\nwhich can be used in downstream analytic tasks.\nWhile the vast majority of existing methods assume that each node is embedded to a deterministic\npoint in the latent space [5, 2, 25, 30, 14, 15, 10, 4], modeling uncertainty is of crucial importance in\nmany applications, including physics and biology. For example, when link prediction in Knowledge\nGraphs is used for driving expensive pharmaceutical experiments, it would be bene\ufb01cial to know\nwhat is the con\ufb01dence level of a model in its prediction. To address this, variational graph auto-\nencoder (VGAE) [18] embeds each node to a random variable in the latent space. Despite its\n\n\u2217Both authors contributed equally.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fpopularity, 1) the Gaussian assumption imposed on the variational distribution restricts its variational\ninference \ufb02exibility when the true posterior distribution given a graph clearly violates the Gaussian\nassumption; 2) the adopted inner-product decoder restricts its generative model \ufb02exibility. While\nrecent study tries to address the \ufb01rst problem by changing the prior distribution but does not show\nmuch practical success [11], the latter one is not well-studied yet to the best of our knowledge.\nInspired by recently developed semi-implicit variational inference (SIVI) [39] and normalizing\n\ufb02ow (NF) [27, 17, 23], which offer the interesting combination of \ufb02exible posterior distribution and\neffective optimization, we propose a hierarchical variational graph framework for node embedding of\ngraph structured data, notably increasing the expressiveness of the posterior distribution for each node\nin the latent space. SIVI enriches mean-\ufb01eld variational inference with a \ufb02exible (implicit) mixing\ndistribution. NF transforms a simple Gaussian random variable through a sequence of invertible\ndifferentiable functions with tractable Jacobians. While NF restricts the mixing distribution in the\nhierarchy to have an explicit probability density function, SIVI does not impose such a constraint.\nBoth SIVI and NF can model complex posterior distributions, which will help when the underlying\ntrue embedded node distribution exhibits heavy tails and/or multiple modes. We further argue that the\ngraph structure cannot be fully exploited by the posterior distribution from the trivial combination of\nSIVI and/or NF with VGAE, if not integrating graph neighborhood information. On the other hand,\nit does not address the \ufb02exibility of the generative model as stated as the second VGAE problem.\nTo address the aforementioned issues, instead of explicitly choosing the posterior distribution family\nin previous works [18, 11], our hierarchical variational framework adopts a stochastic generative\nnode embedding model that can learn implicit posteriors while maintaining simple optimization.\nSpeci\ufb01cally, we innovate a semi-implicit hierarchical construction to model the posterior distribution\nto best \ufb01t both the graph topology and node attributes given graphs. With SIVI, even if the posterior\nis not tractable, its density can be evaluated with Monte Carlo estimation, enabling ef\ufb01cient model\ninference on top of highly enhanced model \ufb02exibility/expressive power. Our semi-implicit graph\nvariational auto-encoder (SIG-VAE) can well model heavy tails, skewness, multimodality, and other\ncharacteristics that are exhibited by the posterior but failed to be captured by existing VGAEs.\nFurthermore, a Bernoulli-Poisson link function [41] is adopted in the decoder of SIG-VAE to increase\nthe \ufb02exibility of the generative model and better capture graph properties of real-world networks that\nare often sparse. SIG-VAE facilitates end-to-end learning for various graph analytic tasks evaluated in\nour experiments. For link prediction, SIG-VAE consistently outperforms state-of-the-art methods by\na large margin. It is also comparable with state-of-the-arts when modi\ufb01ed to perform two additional\ntasks, node classi\ufb01cation and graph clustering, even though node classi\ufb01cation is more suitable to be\nsolved by supervised learning methods. We further show that the new decoder is able to generate\nsparse random graphs whose statistics closely resemble those of real-world graph data. These results\nclearly demonstrate the great practical values of SIG-VAE. The implementation of our proposed\nmodel is accessible at https://github.com/sigvae/SIGraphVAE.\n\n2 Background\n\nVariational graph auto-encoder (VGAE). Many node embedding methods derive deterministic\nlatent representations [14, 15, 10]. By expanding the variational auto-encoder (VAE) notion to graphs,\nKipf and Welling [18] propose to solve the following problem by embedding the nodes to Gaussian\nrandom vectors in the latent space.\nProblem 1. Given a graph G = (V,E) with the adjacency matrix A and M-dimensional node\nattributes X \u2208 RN\u00d7M , \ufb01nd the probability distribution of the latent representation of nodes Z \u2208\nRN\u00d7L , i.e., p(Z| X, A).\nFinding the true posterior, p(Z| X, A), is often dif\ufb01cult and intractable. In Kipf and Welling [18], it\ni=1 qi(zi | \u03c8i) and qi(zi | \u03c8i) = N (zi | \u03c8i)\nwith \u03c8i = {\u00b5i, diag(\u03c32\ni )}. Here, \u00b5i and \u03c3i are l-dimensional mean and standard deviation vectors\ncorresponding to node i, respectively. The parameters of q(Z| \u03c8), i.e., \u03c8 = {\u03c8i}N\ni=1, are modeled\nand learned using two graph convolutional neural networks (GCNs) [19]. More precisely, \u00b5 =\nGCN\u00b5(X, A), log \u03c3 = GCN\u03c3(X, A) and \u00b5 and \u03c3 are matrices of \u00b5i\u2019s and \u03c3i\u2019s, respectively. Given\nZ, the decoder in VGAE is a simple inner-product decoder as p(Ai,j = 1| zi, zj) = sigmoid(zi zT\nj ).\n\nis approximated by a Gaussian distribution, q(Z| \u03c8) =(cid:81)N\n\n2\n\n\fThe parameters of the model are found by optimizing the well known evidence lower\nbound (ELBO) [16, 7, 8, 33]: L = Eq(Z | \u03c8)[p(A| Z)] \u2212 KL[q(Z| \u03c8) || p(Z)]. Note that q(Z| \u03c8)\nhere is equivalent to q(Z| X, A). Despite promising results shown by VGAE, a well-known issue in\nvariational inference is underestimating the variance of the posterior. The reason behind this is the\nmismatch between the representation power of the variational family to which q is restricted and the\ncomplexity of the true posterior, in addition to the use of KL divergence, which is asymmetric, to\nmeasure how different q is from the true posterior.\n\nSemi-implicit variational inference (SIVI). To well characterize the posterior while maintaining\nsimple optimization, semi-implicit variational inference (SIVI) has been proposed by Yin and Zhou\n[39], which is also related to the hierarchical variational inference [26] and auxiliary deep generative\nmodels [21]; see Yin and Zhou [39] for more details about their connections and differences. It\nhas been shown that SIVI can capture complex posterior distributions like multimodal or skewed\ndistributions, which can not be captured by a vanilla VI due to its restricted exponential family\nassumption over both the prior and posterior in the latent space. SIVI assumes that \u03c8, the parameters\nof the posterior, are drawn from an implicit distribution rather than being analytic. This hierarchical\nconstruction enables \ufb02exible mixture modeling and allows to have more complex posteriors while\nmaintaining simple optimization for model inference. More speci\ufb01cally, Z \u223c q(Z| \u03c8) and \u03c8 \u223c q\u03c6(\u03c8)\nwith \u03c6 denoting the distribution parameters to be inferred. Marginalizing \u03c8 out leads to the random\nvariables Z drawn from a distribution family H indexed by variational parameters \u03c6, expressed as\n\nH =\n\nh\u03c6(Z) : h\u03c6(Z) =\n\nq(Z| \u03c8)q\u03c6(\u03c8) d\u03c8\n\n.\n\n(1)\n\n(cid:26)\n\n(cid:90)\n\n\u03c8\n\n(cid:27)\n\nThe importance of semi-implicit formulation is that while the original posterior q(Z| \u03c8) is explicit\nand analytic, the marginal distribution, h\u03c6(Z) is often implicit. Note that, if q\u03c6 equals a delta function,\nthen h\u03c6 is an explicit distribution. Unlike regular variational inference that assumes independent\nlatent dimensions, semi-implicit does not impose such a constraint. This enables the semi-implicit\nvariational distributions to model very complex multivariate distributions.\nSince the marginal probability density function h\u03c6(Z) is often intractable, SIVI derives a lower\nbound for ELBO, as follows, to optimize the variational parameters.\nL = EZ\u223ch\u03c6(Z)\n\n(cid:104)\n\n(cid:105)\n\nlog\n\np(Y, Z)\nh\u03c6(Z)\n\n= \u2212KL(E\u03c8\u223cq\u03c6(\u03c8)[q(Z| \u03c8)] || p(Z| Y)) + log p(Y)\n\u2265 \u2212E\u03c8\u223cq\u03c6(\u03c8)KL(q(Z| \u03c8) || p(Z| Y)) + log p(Y)\n= E\u03c8\u223cq\u03c6(\u03c8)\n\n(cid:104)EZ\u223cq(Z | \u03c8)\n\n(cid:18) p(Y, Z)\n\n(cid:19)(cid:105)(cid:105)\n\n(cid:104)\n\n= L(q(Z| \u03c8), q\u03c6(\u03c8)),\n(2)\nwhere Y is the observations. The inequality E\u03c8KL(q(Z| \u03c8)||p(Z)) \u2265 KL(E\u03c8[q(Z|\u03c8)]||p(Z)) has\nbeen used to derive L. Optimizing this lower bound, however, could drive the mixing distribution\nq\u03c6(\u03c8) towards a point mass density. To address the degeneracy issue, SIVI adds a nonnegative\nregularization term, leading to a surrogate ELBO that is asymptotically exact [39]. We will further\ndiscuss this in the supplementary material.\n\nq(Z| \u03c8)\n\nlog\n\nNormalizing \ufb02ow (NF). NF [23] also enriches the posterior distribution families. Compared to\nSIVI, NF imposes explicit density functions for the mixing distributions in the hierarchy while SIVI\nonly requires q\u03c6 to be reparameterizable. This makes SIVI more \ufb02exible, especially when using it\nfor graph analytics as explained in the next section, since the SIVI posterior can be generated by\ntransforming random noise using any \ufb02exible function, for example a neural network.\n\n3 Baselines: Variational Inference with VGAE\n\nBefore presenting our semi-implicit graph variational auto-encoder (SIG-VAE), we \ufb01rst introduce\ntwo baseline methods that directly combines SIVI and NF with VGAE.\nSIVI-VGAE. To address Problem 1 while well characterizing the posterior with modeling \ufb02exibility\nin the VGAE framework, the naive solution is to take the semi-implicit variational distribution in\n\n3\n\n\fSIVI for modeling latent variables in VGAE, following the hierarchical formulation\n\nZ \u223c q(Z| \u03c8),\n\n\u03c8 \u223c q\u03c6(\u03c8 | X, A),\n\n(3)\n\nby introducing the implicit prior distribution parametrized by \u03c8, which can be sampled from the\nreparametrizable q\u03c6(\u03c8 | X, A). Such a hierarchical semi-implicit construct not only leads to \ufb02exible\nmixture modeling of the posterior but also enables ef\ufb01cient model inference, for example, with \u03c6\nbeing parameterized by deep neural networks. In this framework, the features from multiple layers\nof GNNs can be aggregated and then transformed via multiple fully connected layers after being\nconcatenated by random noise to derive the posterior distribution for each node separately. More\nspeci\ufb01cally, SIVI-VGAE injects random noise at C different stochastic fully connected layers for\neach node independently:\n\nhu = GNNu(A, CONCAT(X, hu\u22121)),\n(cid:96)(i)\nt = T t((cid:96)(i)\n\nfor u = 1, . . . , L, h0 = 0\nL ), where \u0001t \u223c qt(\u0001) for t = 1, . . . , C, (cid:96)(i)\n\nt\u22121, \u0001t, h(i)\n\n0 = 0\n\nL ), \u03a3i(A, X) = g\u03a3((cid:96)(i)\n\nC , h(i)\nL ),\n\nq(Z| A, X, \u00b5, \u03a3) =(cid:81)N\n\n\u00b5i(A, X) = g\u00b5((cid:96)(i)\nC , h(i)\ni=1q(zi | A, X, \u00b5i, \u03a3i),\n\nq(zi | A, X, \u00b5i, \u03a3i) = N (\u00b5i(A, X), \u03a3i(A, X)),\nwhere T t, g\u00b5, and g\u03c3 are all deterministic neural networks, i is the node index, L is the number\nof GNN layers, and \u0001t is random noise drawn from the distribution qt. Note that in the equations\nabove, GNN is any type of existing graph neural networks, such as graph convolutional neural\nnetwork (GCN) [19], GCN with Chebyshev \ufb01lters [13], GraphSAGE [15], jumping knowledge (JK)\nnetworks [36], and graph isomorphism network (GIN) [37]. Given the GNNL output hL, \u00b5i(A, X)\nand \u03a3i(A, X) are now random variables rather than following vanilla VAE to assume deterministic\nvalues. In this way, however, the constructed implicit distributions may not capture the dependency\nbetween neighboring nodes completely. Note that we consider SIVI-VGAE as a naive version of\nour proposed SIG-VAE (and call it as Naive SIG-VAE in the rest of the paper), which is speci\ufb01cally\ndesigned with neighborhood sharing to capture complex dependency structures in networks, as\ndetailed in the next section. Please also note that the \ufb01rst layer of SIVI can be integrated with NF\nrather than simple Gaussian. We leave that for future study.\nNF-VGAE. It is also possible to enable VGAE model \ufb02exibility by other existing variational inference\nmethods, for example using NF. However, NF requires deterministic transform functions whose\nJacobians shall be easy to compute, which limits the \ufb02exibility when considering complex dependency\nstructures in graph analytic tasks. We indeed have constructed a non-Gaussian VGAE, i.e. NF-based\nvariational graph auto-encoder (NF-VGAE) as follows\nhu = GNNu(A, CONCAT(X, hu\u22121)),\n\nfor u = 1, . . . , L, h0 = 0\n\n(4)\n\nq0(Z(0) | A, X) =(cid:81)N\nqK(Z(K) | A, X) =(cid:81)N\n\n\u00b5(A, X) = GNN\u00b5(A, CONCAT(X, hL)), \u03a3(A, X) = GNN\u03a3(A, CONCAT(X, hL)),\ni )),\n\n| A, X) = N (\u00b5i, diag(\u03c32\n\n| A, X), with q0(z(0)\n\ni=1q0(z(0)\n\ni\n\ni\n\ni=1q0(z(K)\n\ni\n\n|A, X),\n\nln(qK(z(K)\n\ni\n\n|\u2212)) = ln(q0(z(0)\n\ni\n\n)) \u2212(cid:88)\n\nln|det \u2202fk\n\u2202z(k)\n\ni\n\n|,\n\nk\n\nwhere the posterior distribution qK(Z(K)|A, X) is obtained by successively transforming a Gaussian\nrandom variable Z(0) with distribution q0 through a chain of K invertible differentiable transforma-\ntions fk : Rd \u2192 Rd. We will further discuss this in the supplementary material. NF-VGAE is a\ntwo-step inference method that 1) starts with Gaussian random variables and then 2) transforms them\nthrough a series of invertible mappings. We emphasize again that in NF-VGAE, the GNN output\nlayers are deterministic without neighborhood distribution sharing due to the deterministic nature of\nthe initial density parameters in q0.\n\n4 Semi-implicit graph variational auto-encoder (SIG-VAE)\n\nWhile the above two models are able to approximate more \ufb02exible and complex posterior, such trivial\ncombinations may fail to fully exploit graph dependency structure because they are not capable of\npropagating uncertainty between neighboring nodes. To enable effective uncertainty propagation,\nwhich is the essential factor to capture complex posteriors with graph data, we develop a carefully\n\n4\n\n\fdesigned generative model, SIG-VAE, to better integrate variational inference and VGAE with a\nnatural neighborhood sharing scheme.\nTo have tractable posterior inference, we construct SIG-VAE using a hierarchy of multiple stochastic\nlayers. Speci\ufb01cally, the \ufb01rst stochastic layer q(Z| X, A) is reparameterizable and has an analytic\nprobability density function. The layers added after are reparameterizable and computationally\nef\ufb01cient to sample from. More speci\ufb01cally, we adopt a hierarchical encoder in SIG-VAE that injects\nrandom noise at L different stochastic layers:\n\nq(Z| A, X, \u00b5, \u03a3) =(cid:81)N\n\ni=1q(zi | A, X, \u00b5i, \u03a3i),\n\nhu = GNNu(A, CONCAT(X, \u0001u, hu\u22121)), where \u0001u \u223c qu(\u0001) for u = 1, . . . , L, h0 = 0\n\u00b5(A, X) = GNN\u00b5(A, CONCAT(X, hL)), \u03a3(A, X) = GNN\u03a3(A, CONCAT(X, hL)),\n\n(5)\n(6)\nq(zi | A, X, \u00b5i, \u03a3i) = N (\u00b5i(A, X), \u03a3i(A, X)).\nNote that in the equations above \u00b5 and \u03a3 are random variables and thus q(Z| X, A) is not necessarily\nGaussian after marginalization; \u0001u is N-dimensional random noise drawn from a distribution qu;\nand qu is chosen such that the samples drawn from it are the same type as X, for example if X is\ncategorical, Bernoulli is a good choice for qu. By concatenating the random noise and node attributes,\nthe output of GNNs are random variables rather than deterministic vectors. Their expressive power is\ninherited in SIG-VAE to go beyond Gaussian, exponential family, or von Mises-Fisher [11] posterior\ndistributions for the derived latent representations.\nIn SIG-VAE, when inferring each node\u2019s latent\nposterior, we incorporate the distributions of\nthe neighboring nodes, better capturing graph\ndependency structure than sharing determin-\nistic features from GNNs. More speci\ufb01cally,\nthe input to our model at stochastic layer u is\nCONCAT(X, \u0001u) so that the outputs of the sub-\nsequent stochastic layers give mixing distribu-\ntions by integrating information from neighbor-\ning nodes (Fig. 1). The \ufb02exibility of SIG-VAE\ndirectly working on the stochastic distribution\nparameters in (5-6) allows neighborhood sharing\nto achieve better performance in graph analytic\ntasks. We argue that the uncertainty propagation in our carefully designed SIG-VAE, which is the an\noutcome of using GNNs and adding noise in the input in equations (5-6), is the key factor in capturing\nmore faithful and complex posteriors. Note that (5) is different from the NF-VAE construction (3),\nwhere the GNN output layers are deterministic. Through experiments, we show that this uncertainty\nneighborhood sharing is key for SIG-VAE to achieve superior graph analysis performance.\nWe further argue that increasing the \ufb02exibility of variational inference is not enough to better model\nreal-world graph data as the optimal solution of the generative model does not change. In SIG-VAE,\nthe Bernoulli-Poisson link [41] is adopted for the decoder to further increase the expressiveness of\nthe generative model. Potential extensions with other decoders can be integrated with SIG-VAE if\n\nFigure 1: SIG-VAE diffuses the distributions of the\nneighboring nodes, which is more informative than shar-\ning deterministic features, to infer each node\u2019s latent\ndistribution.\n\nneeded. Let Ai,j = \u03b4(mij > 0), mij \u223c Poisson(cid:0) exp((cid:80)l\n\nk=1 rkzik zjk)(cid:1), and hence\np(Ai,j = 1| zi, zj, R) = 1\u2212e\u2212 exp ((cid:80)L\n\nN(cid:89)\n\nN(cid:89)\n\np(A| Z, R) =\n\np(Ai,j | zi, zj, R),\n\nk=1 rkzik zjk), (7)\n\ni=1\n\nj=1\n\nwhere R \u2208 RL \u00d7L\n\n+\n\n4.1\n\nInference\n\nis a diagonal matrix with diagonal elements rk.\n\nTo derive the ELBO for model inference in SIG-VAE, we must take into account the fact that \u03c8 has\nto be drawn from a distribution. Hence, the ELBO moves beyond the simple VGAE as\n\nL = \u2212KL(E\u03c8\u223cq\u03c6(\u03c8 | X,A)[q(Z| \u03c8)] || p(Z)) + E\u03c8\u223cq\u03c6(\u03c8 | X,A)[EZ\u223cq(Z | \u03c8)[log p(A| Z)]],\n\n(8)\nwhere h\u03c6 is de\ufb01ned in (1). The marginal probability density function h\u03c6(Z|X, A) is often intractable,\nso the Monte Carlo estimation of the ELBO, L, is prohibited. To address this issue and infer\n\n5\n\n\fFigure 2: Swiss roll graph (left) and its latent representation using SIG-VAE (middle) and VGAE (right). The\nlatent representations (middle and right) are heat maps in R3. We expect that the embedding of the Swiss roll\ngraph with inner-product decoder to be a curved plane in R3, which is clearly captured better by SIG-VAE.\n\nFigure 3: Latent representation distributions of \ufb01ve example nodes from the Swiss roll graph using SIG-VAE\n(blue) and VGAE (red). SIG-VAE clearly infers more complex distributions that can be multi-modal, skewed,\nand with sharp and steep changes. This helps SIG-VAE to better represent the nodes in the latent space.\n\nvariational parameters of SIG-VAE, we can derive a lower bound for the ELBO as follows (see the\nsupplementary material for more details)\n\n(cid:2)EZ\u223cq(Z | \u03c8)[log p(A| Z)](cid:3) \u2264 L.\n\nL = \u2212E\u03c8\u223cq\u03c6(\u03c8 | X,A)[KL(q(Z| \u03c8)|| p(Z))] + E\u03c8\u223cq\u03c6(\u03c8 | X,A)\n\nFurther implementation details and the derivation of the surrogate ELBO can be found in the\nsupplementary material.\n\n5 Experiments\n\nWe test the performances of SIG-VAE on different graph\nanalytic tasks: 1) interpretability of SIG-VAE compared\nto VGAE, 2) link prediction in various real-world graph\ndatasets including graphs with node attributes and without\nnode attributes, 3) graph generation, 4) node classi\ufb01cation\nin the citation graphs with labels. In all of the experi-\nments, GCN [19] is adopted for all the GNN modules in\nSIG-VAE, Naive SIG-VAE, and NF-VGAE, implemented\nin Tensor\ufb02ow [1]. The PyGSP package [12] is used to\ngenerate synthetic graphs. Implementation details for all\nthe experiments, together with graph data statistics, can\nbe found in the supplementary material.\n\n5.1\n\nInterpretable latent representations\n\nFigure 4: The nodes with multi-modal pos-\nteriors (red nodes) reside between different\ncommunities in Swiss Roll graph.\n\nWe \ufb01rst demonstrate the expressiveness of SIG-VAE by illustrating the approximated variational\ndistributions of node latent representations. We show that SIG-VAE captures the graph structure\nbetter and has a more interpretable embedding than VGAE on a generated Swiss roll graph with\n200 nodes and 1244 edges (Fig. 2). In order to provide a fair comparison, both models share an\nidentical implementation with the inner-product decoder and same number of parameters. We simply\nconsider the identity matrix IN as node attributes and choose the latent space dimension to be three\nin this experiment. This graph has a simple plane like structure. As the inner-product decoder\nassumes that the information is embedded in the angle between latent vectors, we expect that the\nnode embedding to map nodes of the Swiss roll graph into a curve in the latent space. As we can\n\n6\n\n0.50.00.51.01.50.80.60.40.20.00.20.40.6-2.0-1.00.01.02.01.51.00.50.00.51.01.52.02.01.51.00.50.00.51.01.5-3.0-1.50.01.53.02.52.01.51.00.50.00.51.01.52.52.01.51.00.50.00.51.0321012.52.01.51.00.50.00.51.01012210122.01.51.00.50.00.51.01.52.52.01.51.00.50.00.51.01.52.01.51.00.50.00.51.0210123\fTable 1: Link prediction performance in networks with node attributes.\n\nCora\n\nCiteseer\n\nPubmed\n\nMethod\n\nSC [31]\nDW [25]\nGAE [18]\nVGAE [18]\nS-VGAE [11]\nSEAL [40]\nG2G [9]\nNF-VGAE\nNaive SIG-VAE\nSIG-VAE (IP)\nSIG-VAE\n\nAUC\n\n84.6 \u00b1 0.01\n83.1 \u00b1 0.01\n91.0 \u00b1 0.02\n91.4 \u00b1 0.01\n94.10 \u00b1 0.1\n90.09 \u00b1 0.1\n92.10 \u00b1 0.9\n92.42 \u00b1 0.6\n93.97 \u00b1 0.5\n94.37 \u00b1 0.1\n96.04 \u00b1 0.04\n\nAP\n\n88.5 \u00b1 0.00\n85.0 \u00b1 0.00\n92.0 \u00b1 0.03\n92.6 \u00b1 0.01\n94.10 \u00b1 0.3\n83.01 \u00b1 0.3\n92.58 \u00b1 0.8\n93.08 \u00b1 0.5\n93.29 \u00b1 0.4\n94.41 \u00b1 0.1\n95.82 \u00b1 0.06\n\nAUC\n\n80.5 \u00b1 0.01\n80.5 \u00b1 0.02\n89.5 \u00b1 0.04\n90.8 \u00b1 0.02\n94.70 \u00b1 0.2\n83.56 \u00b1 0.2\n95.32 \u00b1 0.7\n91.76 \u00b1 0.3\n94.25 \u00b1 0.8\n95.90 \u00b1 0.1\n96.43 \u00b1 0.02\n\nAP\n\n85.0 \u00b1 0.01\n83.6 \u00b1 0.01\n89.9 \u00b1 0.05\n92.0 \u00b1 0.02\n95.20 \u00b1 0.2\n77.58 \u00b1 0.2\n95.57 \u00b1 0.7\n93.04 \u00b1 0.8\n93.60 \u00b1 0.9\n95.46 \u00b1 0.1\n96.32 \u00b1 0.02\n\nAUC\n\n84.2 \u00b1 0.02\n84.4 \u00b1 0.00\n96.4 \u00b1 0.00\n94.4 \u00b1 0.02\n96.00 \u00b1 0.1\n96.71 \u00b1 0.1\n94.28 \u00b1 0.3\n96.59 \u00b1 0.3\n96.53 \u00b1 0.7\n96.73 \u00b1 0.1\n97.01 \u00b1 0.07\n\nAP\n\n87.8 \u00b1 0.01\n84.1 \u00b1 0.00\n96.5 \u00b1 0.00\n94.7 \u00b1 0.02\n96.00 \u00b1 0.1\n90.10 \u00b1 0.1\n93.38 \u00b1 0.5\n96.68 \u00b1 0.4\n96.01 \u00b1 0.5\n96.67 \u00b1 0.1\n97.15 \u00b1 0.04\n\nsee in Fig. 2, SIG-VAE derives a clearly more interpretable planar latent structure than VGAE. We\nalso show the posterior distributions of \ufb01ve randomly selected nodes from the graph in Fig. 3. As we\ncan see, SIG-VAE is capable of inferring complex distributions. The inferred distributions can be\nmulti-modal, skewed, non-symmetric, and with sharp and steep changes. These complex distributions\nhelp the model to get a more realistic embedding capturing the intrinsic graph structure. To explain\nwhy multi-modality may arise, we used Asynchronous Fluid [24] to visualize the Swiss Roll graph\nby highlighting detected communities with different colors in Fig. 4. Note that we used a different\nlayout from the one in Fig. 2(a) to better visualize the communities in the graph. The three red (two\norange) nodes are the nodes with multi-modal (skewed) distributions in Fig. 3. These nodes with\nmulti-modal posteriors reside between different communities; hence, with a probability, they could\nbe assigned to multiple communities. The supplementary material contains additional results and\ndiscussions with a torus graph, with similar observations.\n\n5.2 Accurate link prediction\n\nWe further conduct extensive experiments for link prediction with various real-world graph datasets.\nOur results show that SIG-VAE signi\ufb01cantly outperforms well-known baselines and state-of-the-art\nmethods in all benchmark datasets. We consider two types of datasets, i.e., datasets with node\nattributes and datasets without attributes. We preprocess and split the datasets as done in Kipf and\nWelling [18] with validation and test sets containing 5% and 10% of network links, respectively. We\nlearn the model parameters for 3500 epochs with the learning rate 0.0005 and the validation set used\nfor early stopping. The latent space dimension is set to 16. The hyperparameters of SIG-VAE, Naive\nSIG-VAE, and NF-VGAE are the same for all the datasets. For fair comparison, all methods have\nthe similar number of parameters as the default VGAE. The supplementary material contains further\nimplementation details. We measure the performance by average precision (AP) and area under the\nROC curve (AUC) based on 10 runs on a test set of previously removed links in these graphs.\nWith node attributes. We consider three graph datasets with node attribbutes\u2014Citeseer, Cora, and\nPubmed [28]. The number of node attributes for these dataset are 3703, 1433, and 500 respectively.\nOther statistics of the datasets are summarized in the supplement Table 1. We compare the results\nof SIG-VAE, Naive SIG-VAE, and NF-VGAE with six state-of-the-art methods, including spectral\nclustering (SC), DeepWalk (DW) [25] , GAE [18], VGAE [18], S-VGAE [11], and SEAL [40].\nThe inner-product decoder is also adopted in SIG-VAE to clearly demonstrate the advantages of the\nsemi-implicit hierarchical variational distribution for the encoder.\nWe use the same hyperparameters for the competing methods as stated in [40, 18, 11]. As we can see\nin Table 1, SIG-VAE shows signi\ufb01cant improvement in terms of both AUC and AP over state-of-the-\nart methods. Note the standard deviation of SIG-VAE is also smaller compared to other methods,\nindicating stable semi-implicit variational inference. Compared to the baseline VGAE, more \ufb02exible\nposterior in three proposed methods SIGVAE (with both inner-product and Bernoulli-Poisson link\ndecoders), Naive SIG-VAE, and NF-VGAE can clearly improve the link prediction accuracy. This\nsuggests that the Gaussian assumption does not hold for these graph structured data. The performance\nimprovement of SIG-VAE with inner-product decoder (IP) over Naive SIG-VAE and NF-VGAE\nclearly demonstrates the advantages of neighboring node sharing, especially in the smaller graphs.\nEven for the large graph Pubmed, on which VGAE performs similar to S-VGAE, our SIG-VAE still\nachieves the highest link prediction accuracy, showing the importance of all modeling components\n\n7\n\n\fTable 2: AUC and AP of link prediction in networks without node attributes. * indicates that the\nnumbers are reported from Zhang and Chen [40]. The supplementary material contains the complete\nresult tables with standard deviation values.\n\nMetrics Data\n\nAUC\n\nAP\n\nUSAir\nNS\nYeast\nPower\nRouter\nUSAir\nNS\nYeast\nPower\nRouter\n\nMF\u2217\n94.08\n74.55\n90.28\n50.63\n78.03\n94.36\n78.41\n92.01\n53.50\n82.59\n\nSBM\u2217 N2V\u2217\n91.44\n94.85\n91.52\n92.30\n91.41\n93.67\n76.22\n66.57\n65.46\n85.65\n89.71\n95.08\n94.28\n92.13\n94.90\n92.73\n81.49\n65.48\n84.67\n68.66\n\nLINE\u2217\n81.47\n80.63\n87.45\n55.637\n67.15\n79.70\n85.17\n90.55\n56.66\n71.92\n\nSC\u2217\n74.22\n89.94\n93.25\n91.78\n68.79\n78.07\n90.83\n94.63\n91.00\n73.53\n\nGAE VGAE\u2217\n89.28\n93.09\n94.04\n93.14\n93.74\n93.88\n71.20\n72.21\n61.51\n55.73\n89.27\n95.14\n95.83\n95.26\n95.19\n95.34\n75.91\n77.13\n67.50\n70.36\n\nSEAL\u2217 G2G NF-VGAE N-SIG-VAE SIG-VAE(IP)\n97.09\n97.71\n97.20\n84.18\n95.68\n95.70\n98.12\n97.95\n86.69\n95.66\n\n97.56\n98.75\n98.11\n95.04\n95.94\n97.50\n98.53\n97.97\n96.50\n94.94\n\n94.22\n98.00\n93.36\n93.67\n92.66\n94.48\n97.83\n94.24\n93.80\n92.80\n\n92.17\n98.18\n97.34\n91.35\n85.98\n90.22\n97.43\n97.83\n92.29\n86.28\n\n95.74\n98.38\n97.86\n94.61\n93.56\n96.27\n98.52\n98.18\n95.76\n95.88\n\nSIG-VAE\n\n94.52\n99.17\n98.32\n96.23\n96.13\n94.95\n99.24\n98.41\n97.28\n96.86\n\nin the proposed method including non-Gaussian posterior, using neighborhood distribution, and the\nsparse Bernoulli-Poisson link decoder.\nWithout node attributes. We further consider \ufb01ve graph datasets without node attributes\u2014USAir,\nNS [22], Router [29], Power [34] and Yeast [32]. The data statistics are summarized in the supplement\nTable 1. We compare the performance of our models with seven competing state-of-the-art methods\nincluding matrix factorization (MF), stochastic block model (SBM) [3], node2vec (N2V) [14], LINE\n[30], spectral clustering (SC), VGAE [18], S-VGAE [11], and SEAL [40].\nFor baseline methods, we use the same hyperparameters as stated in Zhang et al. [40]. For datasets\nwithout node attributes, we use a two-stage learning process for SIG-VAE. First, the embedding of\neach node is learned in the 128-dimensional latent space while injecting 5-dimensional Bernoulli\nnoise to the system. Then the learned embedding is taken as node features for the second stage to learn\n16 dimensional embedding while injecting 64-dimensional noise to SIG-VAE. Through empirical\nexperiments, we found that this two-stage learning converges faster than end-to-end learning. We\nfollow the same procedure for Naive SIG-VAE and NF-VGAE.\nAs we can see in Table 2, SIG-VAE again shows the consistent superior performance compared to the\ncompeting methods, especially over the baseline VGAE, in both AUC and AP. It is interesting to note\nthat, while the proposed Berhoulli-Poisson decoder works well for sparser graphs, especially NS and\nRouter datasets, SIG-VAE with inner-product decoder shows superior performance for the USAir\ngraph which is much denser. Compared to the baseline VGAE, both Naive SIG-VAE and NF-VGAE\nimprove the results with a large margin in both AUC and AP, showing the bene\ufb01ts of more \ufb02exible\nposterior. Comparing SIG-VAE with two other \ufb02exible inference methods shows not only SIG-VAE\nis not restricted to the Gaussian assumption, which is not a good \ufb01t for link prediction with the\ninner-product decoder [11], but also it is able to model \ufb02exible posterior considering graph topology.\nThe results for the link prediction of the Power graph clearly magni\ufb01es this fact as SIG-VAE improves\nthe accuracy by 34% compared to VGAE. The supplementary material contains the results with\nstandard deviation values over different runs, showing the stability again.\nAblation studies have also been run to evaluate SIG-VAE with inner-product decoder in link prediction\nfor citation graphs without using node attributes. The [AUC, AP] are [91.14, 90.99] for Cora and\n[88.72, 88.24] for Citeseer, lower than the values from SIG-VAE with attributes in Table 1 but are still\ncompetitive against existing methods (even with node attributes), showing the ability of SIG-VAE\nof utilizing graph structure. While some of the methods, like SEAL, work well for graphs without\nnode attributes and some of others, like VGAE, get good performance for graphs with node attributes,\nSIG-VAE consistently achieves superior performance in both types of datasets. This is due to the fact\nthat SIG-VAE can learn implicit distributions for nodes, which are very powerful in capturing graph\nstructure even without any node attributes.\n\n5.3 Graph generation\n\nTo further demonstrate the \ufb02exibility of SIG-VAE as a generative model, we have used the inferred\nembedding representations to generate new graphs. For example, SIG-VAE infers network parameters\nfor Cora whose density and average clustering coef\ufb01cients are 0.00143 and 0.24, respectively. Using\nthe inferred posterior and learned decoder, a new graph is generated with corresponding rk to see\nif its graph statistics are close to the original ones. Please note that we have shrunk inferred rk\u2019s\nsmaller than 0.01 to 0. The density and average clustering coef\ufb01cients of this generated graph based\non SIG-VAE are 0.00147 and 0.25, respectively, which are very close to the original graph. We also\ngenerate new graphs based on SIG-VAE with the inner-product decoder and VGAE. The density and\n\n8\n\n\faverage clustering coef\ufb01cients of the generated graphs based on SIG-VAE (IP) and VGAE are same,\ni.e. 0.1178 and 0.49, respectively, showing the inner-product decoder may not be a good choice for\nsparse graphs. The supplementary material includes more examples.\n\n5.4 Node classi\ufb01cation & graph clustering\n\nTable 3: Summary of results in terms of\nclassi\ufb01cation accuracy (in percent).\n\nWe also have applied SIG-VAE for node classi\ufb01cation on\ncitation graphs with labels by modifying the loss func-\ntion to include graph reconstruction and semi-supervised\nclassi\ufb01cation terms. Results are summarized in Table 3.\nOur model exhibits strong generalization properties, high-\nlighted by its competitive performance compared to the\nstate-of-the-art methods, despite not being trained specif-\nically for this task. To show the robustness of SIG-VAE\nto missing edges, we randomly removed 10, 20, 50 and\n70 (%) edges while keeping node attributes. The mean\naccuracy of 10 run for Cora (2 layers [32,16]) are 79.5,\n78.7, 75.3 and 60.6, respectively. The supplementary ma-\nterial contains additional results and discussion for graph\nclustering, again without speci\ufb01c model tuning.\nSIG-VAE has demonstrated state-of-the-art performances in link prediction and comparable results\non other tasks, clearly showing the potential of SIG-VAE on different graph analytic tasks.\n\nCora Citeseer Pubmed\n70.7\n59.5\n71.1\n59.0\n63.0\n68.0\n65.3\n67.2\n73.9\n75.1\n77.2\n75.7\n81.5\n79.0\n79.3\n79.7\n\nMethod\nManiReg [6]\nSemiEmb [35]\nLP [42]\nDeepWalk [25]\nICA [20]\nPlanetoid [38]\nGCN [19]\nSIG-VAE\n\n60.1\n59.6\n45.3\n43.2\n69.1\n64.7\n70.3\n70.4\n\n6 Conclusion\n\nCombining the advantages of semi-implicit hierarchical variational distribution and VGAE with a\nBernoulli-Poisson link decoder, SIG-VAE is developed to enrich the representation power of the\nposterior distribution of node embedding given graphs so that both the graph structural and node\nattribute information can be best captured in the latent space. By providing a surrogate evidence\nlower bound that is asymptotically exact, the optimization problem for SIG-VAE model inference\nis amenable via stochastic gradient descent, without compromising the \ufb02exbility of its variational\ndistribution. Our experiments with different graph datasets have shown the promising capability of\nSIG-VAE in a range of graph analysis applications with interpretable latent representations, thanks to\nthe hierarchical construction that diffuses the distributions of neighborhood nodes in given graphs.\n\n7 Acknowledgments\n\nThe presented materials are based upon the work supported by the National Science Foundation under\nGrants ENG-1839816, IIS-1848596, CCF-1553281, IIS-1812641 and IIS-1812699. We also thank\nTexas A&M High Performance Research Computing and Texas Advanced Computing Center for\nproviding computational resources to perform experiments in this work.\n\nReferences\n[1] Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,\nGreg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,\nAndrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,\nManjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray,\nChris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,\nPaul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete\nWarden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-\nscale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/.\nSoftware available from tensor\ufb02ow.org.\n\n[2] Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexan-\nder J Smola. Distributed large-scale natural graph factorization. In Proceedings of the 22nd\ninternational conference on World Wide Web, pages 37\u201348. ACM, 2013.\n\n9\n\n\f[3] Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. Mixed membership\n\nstochastic blockmodels. Journal of Machine Learning Research, 9(Sep):1981\u20132014, 2008.\n\n[4] Mohammadreza Armandpour, Patrick Ding, Jianhua Huang, and Xia Hu. Robust negative sam-\npling for network embedding. In Proceedings of the AAAI Conference on Arti\ufb01cial Intelligence,\nvolume 33, pages 3191\u20133198. AAAI, 2019.\n\n[5] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding\nand clustering. In Advances in neural information processing systems, pages 585\u2013591, 2002.\n\n[6] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric\nframework for learning from labeled and unlabeled examples. Journal of machine learning\nresearch, 7(Nov):2399\u20132434, 2006.\n\n[7] Christopher M Bishop and Michael E Tipping. Variational relevance vector machines. In\nProceedings of the Sixteenth conference on Uncertainty in arti\ufb01cial intelligence, pages 46\u201353.\nMorgan Kaufmann Publishers Inc., 2000.\n\n[8] David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for\n\nstatisticians. Journal of the American Statistical Association, 112(518):859\u2013877, 2017.\n\n[9] Aleksandar Bojchevski and Stephan Gunnemann. Deep gaussian embedding of graphs: Unsuper-\nvised inductive learning via ranking. In International Conference on Learning Representations,\n2018.\n\n[10] Haochen Chen, Bryan Perozzi, Yifan Hu, and Steven Skiena. Harp: Hierarchical representation\n\nlearning for networks. In Thirty-Second AAAI Conference on Arti\ufb01cial Intelligence, 2018.\n\n[11] Tim R Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, and Jakub M Tomczak. Hyper-\n\nspherical variational auto-encoders. arXiv preprint arXiv:1804.00891, 2018.\n\n[12] Michael Defferrard, Lionel Martin, Rodrigo Pena, and Nathanael Perraudin. Pygsp: Graph\n\nsignal processing in python. URL https://github.com/epfl-lts2/pygsp/.\n\n[13] Michael Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks\non graphs with fast localized spectral \ufb01ltering. In Advances in Neural Information Processing\nSystems, pages 3844\u20133852, 2016.\n\n[14] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks.\n\nIn\nProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and\ndata mining, pages 855\u2013864. ACM, 2016.\n\n[15] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large\n\ngraphs. In Advances in Neural Information Processing Systems, pages 1024\u20131034, 2017.\n\n[16] Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. An in-\ntroduction to variational methods for graphical models. Machine learning, 37(2):183\u2013233,\n1999.\n\n[17] Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling.\nImproved variational inference with inverse autoregressive \ufb02ow. In Advances in neural informa-\ntion processing systems, pages 4743\u20134751, 2016.\n\n[18] Thomas N Kipf and Max Welling. Variational graph auto-encoders.\n\narXiv:1611.07308, 2016.\n\narXiv preprint\n\n[19] Thomas N Kipf and Max Welling. Semi-supervised classi\ufb01cation with graph convolutional\n\nnetworks. In International Conference on Learning Representations, 2017.\n\n[20] Qing Lu and Lise Getoor. Link-based classi\ufb01cation. In Proceedings of the 20th International\n\nConference on Machine Learning (ICML-03), pages 496\u2013503, 2003.\n\n[21] Lars Maaloe, Casper Kaae Sonderby, Soren Kaae Sonderby, and Ole Winther. Auxiliary deep\ngenerative models. In International Conference on Machine Learning, pages 1445\u20131453, 2016.\n\n10\n\n\f[22] Mark EJ Newman. Finding community structure in networks using the eigenvectors of matrices.\n\nPhysical review E, 74(3):036104, 2006.\n\n[23] George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive \ufb02ow for density\nestimation. In Advances in Neural Information Processing Systems, pages 2338\u20132347, 2017.\n\n[24] Ferran Par\u00e9s, Dario Garcia Gasulla, Armand Vilalta, Jonatan Moreno, Eduard Ayguad\u00e9, Jes\u00fas\nLabarta, Ulises Cort\u00e9s, and Toyotaro Suzumura. Fluid communities: a competitive, scalable\nand diverse community detection algorithm. In International Conference on Complex Networks\nand their Applications, pages 229\u2013240. Springer, 2017.\n\n[25] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social repre-\nsentations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge\ndiscovery and data mining, pages 701\u2013710. ACM, 2014.\n\n[26] Rajesh Ranganath, Dustin Tran, and David Blei. Hierarchical variational models. In Interna-\n\ntional Conference on Machine Learning, pages 324\u2013333, 2016.\n\n[27] Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing \ufb02ows.\n\narXiv preprint arXiv:1505.05770, 2015.\n\n[28] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-\n\nRad. Collective classi\ufb01cation in network data. AI magazine, 29(3):93, 2008.\n\n[29] Neil Spring, Ratul Mahajan, and David Wetherall. Measuring isp topologies with rocketfuel.\n\nACM SIGCOMM Computer Communication Review, 32(4):133\u2013145, 2002.\n\n[30] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-\nscale information network embedding. In Proceedings of the 24th International Conference\non World Wide Web, pages 1067\u20131077. International World Wide Web Conferences Steering\nCommittee, 2015.\n\n[31] Lei Tang and Huan Liu. Leveraging social media networks for classi\ufb01cation. Data Mining and\n\nKnowledge Discovery, 23(3):447\u2013478, 2011.\n\n[32] Christian Von Mering, Roland Krause, Berend Snel, Michael Cornell, Stephen G Oliver, Stanley\nFields, and Peer Bork. Comparative assessment of large-scale data sets of protein\u2013protein\ninteractions. Nature, 417(6887):399, 2002.\n\n[33] Martin J Wainwright, Michael I Jordan, et al. Graphical models, exponential families, and\n\nvariational inference. Foundations and Trends in Machine Learning, 1(1\u20132):1\u2013305, 2008.\n\n[34] Duncan J Watts and Steven H Strogatz. Collective dynamics of small-world networks. nature,\n\n393(6684):440, 1998.\n\n[35] Jason Weston, Frederic Ratle, Hossein Mobahi, and Ronan Collobert. Deep learning via semi-\nsupervised embedding. In Neural Networks: Tricks of the Trade, pages 639\u2013655. Springer,\n2012.\n\n[36] Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and\nStefanie Jegelka. Representation learning on graphs with jumping knowledge networks. In\nInternational Conference on Machine Learning, pages 5453\u20135462, 2018.\n\n[37] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural\n\nnetworks? In International Conference on Learning Representations, 2019.\n\n[38] Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Revisiting semi-supervised learning\n\nwith graph embeddings. arXiv preprint arXiv:1603.08861, 2016.\n\n[39] Mingzhang Yin and Mingyuan Zhou. Semi-implicit variational inference. In International\n\nConference on Machine Learning, pages 5660\u20135669, 2018.\n\n[40] Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. arXiv preprint\n\narXiv:1802.09691, 2018.\n\n11\n\n\f[41] Mingyuan Zhou. In\ufb01nite edge partition models for overlapping community detection and link\n\nprediction. In Arti\ufb01cial Intelligence and Statistics, pages 1135\u20131143, 2015.\n\n[42] Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian\n\ufb01elds and harmonic functions. In Proceedings of the 20th International conference on Machine\nlearning (ICML-03), pages 912\u2013919, 2003.\n\n12\n\n\f", "award": [], "sourceid": 5716, "authors": [{"given_name": "Arman", "family_name": "Hasanzadeh", "institution": "Texas A&M University"}, {"given_name": "Ehsan", "family_name": "Hajiramezanali", "institution": "Texas A&M University"}, {"given_name": "Krishna", "family_name": "Narayanan", "institution": "Texas A&M University"}, {"given_name": "Nick", "family_name": "Duffield", "institution": "Texas A&M University"}, {"given_name": "Mingyuan", "family_name": "Zhou", "institution": "University of Texas at Austin"}, {"given_name": "Xiaoning", "family_name": "Qian", "institution": "Texas A&M"}]}