{"title": "Completely random measures for modelling block-structured sparse networks", "book": "Advances in Neural Information Processing Systems", "page_first": 4260, "page_last": 4268, "abstract": "Statistical methods for network data often parameterize the edge-probability by attributing latent traits such as block structure to the vertices and assume exchangeability in the sense of the Aldous-Hoover representation theorem. These assumptions are however incompatible with traits found in real-world networks such as a power-law degree-distribution. Recently, Caron & Fox (2014) proposed the use of a different notion of exchangeability after Kallenberg (2005) and obtained a network model which permits edge-inhomogeneity, such as a power-law degree-distribution whilst retaining desirable statistical properties. However, this model does not capture latent vertex traits such as block-structure. In this work we re-introduce the use of block-structure for network models obeying Kallenberg\u2019s notion of exchangeability and thereby obtain a collapsed model which both admits the inference of block-structure and edge inhomogeneity. We derive a simple expression for the likelihood and an efficient sampling method. The obtained model is not significantly more difficult to implement than existing approaches to block-modelling and performs well on real network datasets.", "full_text": "Completely random measures for modelling\n\nblock-structured sparse networks\n\nTue Herlau Mikkel N. Schmidt Morten M\u00f8rup\n\nDTU Compute\n\nTechnical University of Denmark\n\nRichard Petersens plads 31,\n\n2800 Lyngby, Denmark\n\n{tuhe,mns,mmor}@dtu.dk\n\nAbstract\n\nStatistical methods for network data often parameterize the edge-probability by\nattributing latent traits such as block structure to the vertices and assume ex-\nchangeability in the sense of the Aldous-Hoover representation theorem. These\nassumptions are however incompatible with traits found in real-world networks\nsuch as a power-law degree-distribution. Recently, Caron & Fox (2014) proposed\nthe use of a different notion of exchangeability after Kallenberg (2005) and ob-\ntained a network model which permits edge-inhomogeneity, such as a power-law\ndegree-distribution whilst retaining desirable statistical properties. However, this\nmodel does not capture latent vertex traits such as block-structure. In this work we\nre-introduce the use of block-structure for network models obeying Kallenberg\u2019s\nnotion of exchangeability and thereby obtain a collapsed model which both admits\nthe inference of block-structure and edge inhomogeneity. We derive a simple\nexpression for the likelihood and an ef\ufb01cient sampling method. The obtained\nmodel is not signi\ufb01cantly more dif\ufb01cult to implement than existing approaches to\nblock-modelling and performs well on real network datasets.\n\n1\n\nIntroduction\n\nTwo phenomena are generally considered important for modelling complex networks. The \ufb01rst is\ncommunity or block structure, where the vertices are partitioned into non-overlapping blocks (denoted\nby (cid:96) = 1, . . . , K in the following) and the probability two vertices i, j are connected depends on\ntheir assignment to blocks:\n\nP(cid:0)Edge between vertex i and j(cid:1) = \u03be(cid:96)m\n\nwhere \u03be(cid:96)m \u2208 [0, 1] is a number only depending on the blocks (cid:96), m to which i, j respectively belongs.\nStochastic block models (SBMs) were \ufb01rst proposed by White et al. (1976) and today form the basic\nstarting point for many important link-prediction methods such as the in\ufb01nite relational model (Xu\net al., 2006; Kemp et al., 2006).\nWhile block-structure is important for link prediction, the degree distribution of edges in complex\nnetworks is often found to follow a power-law (Newman et al., 2001; Strogatz, 2001). This realization\nhas led to many important models of network growth, such as the preferential attachment (PA) model\nof Barab\u00e1si (1999).\nModels such as the IRM and the PA model have different goals. The PA model attempts to explain\nhow network structure, such as the degree distribution, follows from simple rules of network growth\nand is not suitable for link prediction. In contrast, the IRM aims to discover latent block-structure\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fand predict edges \u2014 tasks for which the PA model is unsuitable. In the following, network model\nwill refer to a model with the same aims as the IRM, most notably prediction of missing edges.\n\n1.1 Exchangeability\n\nInvariance is an important theme in Bayesian approaches to network modelling. For network data, the\ninvariance which has received most attention is in\ufb01nite exchangeability of random arrays. Suppose\nwe represent the network as a subset of an in\ufb01nite matrix A = (Aij)ij\u22651 such that Aij is the number\nof edges between vertex i and j (we will allow multi and self-edges in the following). In\ufb01nite\nexchangeability of the random array (Aij)ij\u22651 is the requirement that (Hoover, 1979; Aldous, 1981)\nd= (A\u03c3(i)\u03c3(j))ij\u22651 for all \ufb01nite permutations \u03c3 of N. The distribution of a \ufb01nite network is\n(Aij)ij\u22651\nthen obtained by marginalization. According to the Aldous-Hoover theorem (Hoover, 1979; Aldous,\n1981), an in\ufb01nite exchangeable network has a representation in terms of a random function, and\nfurthermore, the number of edges in the network must either scale as the square of the number\nof vertices or (with probability 1) be zero (Orbanz & Roy, 2015). Neither of these options are\ncompatible with a power-law degree distribution and one is faced with the dilemma of giving up\neither the power-law distribution or exchangeability. It is the \ufb01rst horn of this dilemma which has\nbeen pursued by much work on Bayesian network modelling (Orbanz & Roy, 2015).\nIt is, however, possible to substitute the notation of in\ufb01nite exchangeability in the above sense with\na different de\ufb01nition due to Kallenberg (2005, chapter 9). The new notion retains many important\ncharacteristics of the former, including a powerful representation theorem parallelling the Aldous-\nHoover theorem but expressed in terms of a random set. Important progress in exploring network\nmodels based on this representation has recently been made by Caron & Fox (2014), who demonstrate\nthe ability to model power-law behaviour of the degree distribution and construct an ef\ufb01cient sampler\nfor parameter inference. The reader is encouraged to consult this reference for more details.\nIn this paper, we will apply the ideas of Caron & Fox (2014) to block-structured network data,\nthereby obtaining a model based on the same structural invariance, yet able to capture both block-\nstructure and degree heterogeneity. The contribution of this work is fourfold: (i) we propose general\nextension of sparse networks to allow latent structure, (ii) using this construction we implement a\nblock-structured network model which obey Kallenbergs notion of exchangeability, (iii) we derive a\ncollapsed expression of the posterior distribution which allows ef\ufb01cient sampling, (iv) demonstrate\nthat the resulting model offers superior link prediction compared to both standard block-modelling\nand the model of Caron & Fox (2014).\nIt should be noted that independently of this manuscript, Veitch & Roy (2015) introduced a construc-\ntion similar to our eq. (4) but focusing on the statistical properties of this type of random process,\nwhereas this manuscript focuses on the practical implementation of network models based on the\nconstruction.\n\n2 Methods\n\nBefore introducing the full method we will describe the construction informally, omitting details\nrelating to completely random measures.\n\n2.1 A simple approach to sparse networks\nSuppose the vertices in the network are labelled by real numbers in R+. An edge e (edges are\nconsidered directed and we allow for self-edges) then consists of two numbers (xe1, xe2) \u2208 R2\ndenoted the edge endpoint. A network X of L edges (possibly L = \u221e) is simply the collection of\npoints X = ((xe1, xe2))L\n+. We adopt the convention that multi-edges implies duplicates in\nthe list of edges. Suppose X is generated by a Poisson process with base measure \u03be on R2\n\ne=1 \u2282 R2\n\n+\n\nX \u223c PP(cid:0)\u03be(cid:1).\n\n+\n\n(1)\n\nA \ufb01nite network X\u03b1 can then be obtained by considering the restriction of X to [0, \u03b1]2: X\u03b1 =\nX \u2229 [0, \u03b1]2. As an illustration, suppose \u03be is the Lebesgue measure. The number of edges is then\nL \u223c Poisson(\u03b12) and the edge-endpoints xe1, xe2 are i.i.d. on [0, \u03b1] simply corresponding to\nselecting L random points in [0, \u03b1]2. The edges are indicated by the gray squares in \ufb01gure 1a and the\n\n2\n\n\f(b) Nontrivial network\n\n(a) Maximally sparse network\n(c) Nontrivial network\nFigure 1: (Left:) A network is generated by randomly selecting points from [0, \u03b1]2 \u2282 R2\n+ corre-\nsponding to edges (squares) and identifying the unique coordinates with vertices (circles), giving\nthe maximally disconnected graph. (Middle:) The edges are restricted to lie at the intersection of\nrandomly generated gray lines at \u03b8i, each with a mass/sociability parameter wi. The probability of\nselecting an intersection is proportional to wiwj, giving a non-trivial network structure. (Right:) Each\nvertex is assigned a latent trait zi (the assignment to blocks as indicated by the colors) that modulates\nthe edge probability with a parameter \u03b7(cid:96)m \u2265 0, thus allowing block-structured networks.\n\n\u00b5 = (cid:80)\nthe number of edges L is Poisson(cid:0)T 2(cid:1), T = \u00b5([0, \u03b1]) = (cid:80)\u221e\n\nvertices as circles. Notice the vertices will be distinct with probability 1 and the procedure therefore\ngives rise to the degenerate but sparse network of 2L vertices and L edges, shown in \ufb01gure 1a.\nTo generate non-trivial networks, the edge-endpoints must coincide with nonzero probability. Similar\nto Caron & Fox (2014), suppose the coordinates are restricted to only take a countable number of\npotential values, \u03b81, \u03b82,\u00b7\u00b7\u00b7 \u2208 R+ and each value has an associated sociability (or mass) parameter\nw1, w2,\u00b7\u00b7\u00b7 \u2208 [0,\u221e[ (we use the shorthand (\u03b8i)i = (\u03b8i)\u221e\ni=1 for a series). If we de\ufb01ne the measure\ni\u22651 wi\u03b4\u03b8i and let \u03be = \u00b5 \u00d7 \u00b5, then generating X\u03b1 according to the procedure of eqn. (1)\ni=1 wi distributed. The position of\nthe edges remains identically distributed, but with probability proportional to wiwj of selecting\ncoordinate (\u03b8i, \u03b8j). Since the edge-endpoints coincide with non-zero probability this procedure\nallows the generation of a non-trivial associative network structure, see \ufb01gure 1b. With proper\nchoice of (wi, \u03b8i)i\u22651 these networks exhibit many desirable properties, such as a power-law degree\ndistribution and sparsity (Caron & Fox, 2014).\nThis process can be intuitively extended to block-structured networks, as illustrated in \ufb01gure 1c.\nThere, each vertex is assigned a latent trait (i.e. a block assignment), here highlighted by the colors.\nWe use the symbol zi \u2208 {1, . . . , K} to indicate the assignment of vertex i to one of the K blocks.\nWe can then consider a measure of the form\n\n\u03be =\n\n\u03b7zizj wiwj\u03b4(\u03b8i,\u03b8j ) =\n\n\u03b7(cid:96)m\u00b5(cid:96) \u00d7 \u00b5m,\n\n(2)\n\ni:zi=(cid:96) wi\u03b4\u03b8i. De\ufb01ned in this manner, \u03be is a measure on [0, \u03b1]2\nand \u03b7(cid:96)m parameterizes the interaction strength between community (cid:96) and m. Notice the number\nof edges L(cid:96)m between block (cid:96) and m is, by basic properties of the Poisson process, distributed as\nL(cid:96)m \u223c Poisson(\u03b7(cid:96)mT(cid:96)Tm), where T(cid:96) = \u00b5(cid:96)([0, \u03b1]). In \ufb01gure 1c the locations \u03b8i of the vertices have\nbeen arti\ufb01cially ordered according to color for easy visualization. The following section will show\nthe connection between the above construction of eq. (2) and the exchangeable representation due to\nKallenberg (2005). However, for greater generality, we will let the latent trait be a general continuous\nparameter ui \u2208 [0, 1] and later show that block-structured models can be obtained as a special case.\n\n2.2 Exchangeability and point-process network models\n\nSince the networks in the point-set representation are determined by the properties of the measure\n\u03be, invariance (i.e. exchangeability) of random point-set networks is de\ufb01ned as invariance of this\nrandom measure. Recall in\ufb01nite exchangeability for in\ufb01nite matrices requires that the distribution\nof the random matrix to be unchanged by permutation of the rows/columns in the network. For\n\n3\n\n(cid:88)\nwhere we have introduced \u00b5(cid:96) =(cid:80)\n\ni,j\u22651\n\nK(cid:88)\n\n(cid:96),m=1\n\n0\u03b1\u03b10(xe1,xe2)xe1xe2xe1xe20\u03b1(cid:80)i\u22651wi\u03b4\u03b8iAij\u03b10\u03b86\u03b82\u03b81\u03b83\u03b84\u03b85\u03b81\u03b83\u03b82\u03b86\u03b84w3w1w2w6w4w5\u03b85p(Aij)=Poisson(wiwj)\u03b84w40\u03b1(cid:80)zi=1wi\u03b4\u03b8ip(Aij)=Poisson(wiwj\u03b713)Aij(cid:80)zi=2wi\u03b4\u03b8i(cid:80)zi=3wi\u03b4\u03b8i\u03b10zi=1zi=2zi=3\fFigure 2: (Step 1:) The potential vertex locations, \u03b8i, latent traits ui and sociability parameters\nwi are generated using a generalized gamma process (Step 2:) The interaction of the latent traits\nf : [0, 1]2 \u2192 R+, the graphon, is chosen to be a piece-wise constant function (Step 3:) Together,\nthese determine the random measure \u03be which is used to generate the network from a Poisson process\n\na random measure on R2\n+, the corresponding requirement is that it should be possible to partition\nR+ into intervals I1, I2, I3, . . . , permute the intervals, and have the random measure be invariant to\nthis permutation. Formally, a random measure \u03be on R2\n+ is then said to be jointly exchangeable if\n\u03be \u25e6 (\u03d5 \u2297 \u03d5)\u22121 d= \u03be for all measure-preserving transformations \u03d5 of R+. According to Kallenberg\n(2005, theorem 9.24), this is ensured provided the measure has a representation of the form:\n\n\u03be =\n\nh(\u03b6, xi, xj)\u03b4(\u03b8i,\u03b8j ),\n\n(3)\n\n(cid:88)\n\ni,j\u22651\n\nwhere h is a measurable function, \u03b6 is a random variable and {(xi, \u03b8i)}i\u22651 is a unit rate Poisson\nprocess on R2\n+ (the converse involves \ufb01ve additional terms (Kallenberg, 2005)). In this representation,\nthe locations (\u03b8i)i and the parameters (xi)i are decoupled, however we are free to select the random\nparameters (xi)i\u22651 to lie in a more general space than R+. Speci\ufb01cally, we de\ufb01ne\n\nxi = (ui, vi) \u2208 [0, 1] \u00d7 R+,\n\nwith the interpretation that each vi corresponds to a random mass wi through a transformation\nwi = g(vi), and each ui \u2208 [0, 1] is a general latent trait of the vertex. (In \ufb01gure 1 this parameter\ncorresponded to the assignment to blocks). We then consider the following choice:\n\nh(\u03b6, xi, xj) = f (ui, uj)gzi (vi)gzj (vj)\n\n(4)\nwhere f : [0, 1]2 \u2192 R+ is a measurable function playing a similar role as the graphon in the Aldous-\nHoover representation, and {(ui, vi, \u03b8i)}i\u22651 follows a unit-rate Poisson process on [0, 1] \u00d7 R2\n+.\nTo see the connection with the block-structured model, suppose the function f is a piece-wise constant\nfunction\n\nK(cid:88)\n\n(cid:96),m=1\n\n\u03b7(cid:96)m1J(cid:96)(u)1Jm (u(cid:48)),\n\n(cid:104)(cid:80)(cid:96)\u22121\nm=1 \u03b2m,(cid:80)(cid:96)\n\nf (u, u(cid:48)) =\n\n(cid:104)\n\n,(cid:80)K\n\nm=1 \u03b2m\n\nwhere J(cid:96) =\n(cid:96)=1 \u03b2(cid:96) = 1, \u03b2(cid:96) > 0 and zi = (cid:96) denotes the event 1J(cid:96)(ui) =\n1. Notice this choice for f is exactly equivalent to the graphon for the block-structured network\nmodel in the Aldous-Hoover representation (Orbanz & Roy, 2015). The procedure is illustrated\nin \ufb01gure 2. Realizations of networks generated by this process using different values of K can be\nobtained using the simulation methods of Caron & Fox (2014) and can be seen in \ufb01gure 3. Notice the\nK = 1, \u03b711 = 1 case corresponds to their method.\nTo fully de\ufb01ne the method we must \ufb01rst introduce the relevant prior for the measure \u00b5 =\ni\u22651 wi\u03b4(\u03b8i,ui). As a prior we will use the Generalized Gamma-process (GGP) (Hougaard, 1986).\nIn the following section, we will brie\ufb02y review properties of completely random measures and use\nthese to derive a simple expression of the posterior.\n\n(cid:80)\n\n4\n\n00.5101234567891000.5Step1:Generatecandidateverticesw\u03b8u00\u03b21\u03b22\u03b23f(u,u(cid:48))=11\u03b21\u03b22\u03b23\u03b711\u03b722\u03b733\u03b712\u03b713\u03b723\u03b721\u03b731\u03b732Step2:Selectgraphonf0246810024681000.050.10.150.20.250.30.35\u03b8i\u03b8j(cid:80)i\u22651wi\u03b4(\u03b8i,ui)\u223cCRM(\u03c1\u03c3,\u03c4,R+\u00d7[0,1]).\u03c1\u03c3,\u03c4istheL\u00e9vyintensityofaGGP(\u03b2i)Ki=1\u223cDirichlet(cid:0)\u03b10K,\u00b7\u00b7\u00b7,\u03b10K(cid:1)\u03b7(cid:96)m\u223cGamma(\u03bba,\u03bbb)Step3:Formmeasure\u03be=(cid:80)i,j\u22651wiwjf(ui,uj)\u03b4(\u03b8i,\u03b8j)\f2.3 Random measures\n\nAs a prior for \u00b5 we will use completely random\nmeasures (CRMs) and the reader is referred to\n(Kallenberg, 2005; Kingman, 1967) for a com-\nprehensive account. Recall \ufb01rst the de\ufb01nition\nof a CRM. Assume S is a separable complete\nmetric space with the Borel \u03c3-\ufb01eld B(S) (for\nour purpose S = [0, \u03b1]). A random measure\n\u00b5 is a random variable whose values are mea-\nsures on S. For each measurable set A \u2208 B(S),\nthe random measure induces a random variable\n\u00b5(A), and the random measure \u00b5 will be said\nto be completely random if for any \ufb01nite collec-\ntion A1, . . . , An of disjoint measurable sets the\nrandom variables \u00b5(A1), . . . , \u00b5(An) are inde-\npendent. It was shown by Kingman (1967) that\nthe non-trivial part of any random measure \u00b5 is\ndiscrete almost certainly with a representation\n\n\u00b5 =\n\nwi\u03b4\u03b8i,\n\n(5)\n\n\u221e(cid:88)\n\ni=1\n\nFigure 3: (Top:) Example of four randomly gen-\nerated networks for K = 1, 2, 3 and 4 using the\nchoice of random measure discussed in section 2.3.\nThe other parameters were \ufb01xed at \u03b1 = 20K, \u03c4 =\n1, \u03c3 = 0.5 and \u03bba = \u03bbb = 1. Vertices have been\nsorted according to their assignment to blocks and\nsociability parameters.(Bottom:) The same net-\nworks as above but applying a random permutation\nto the edges within each tile. A standard SBM\nassumes a network structure of this form.\n\n\u221e(cid:88)\n\ni=1\n\n\u221e(cid:88)\n\ni=1\n\nwhere the sequence of masses and locations\n(wi, \u03b8i)i (also known as the atoms) is a Pois-\nson random measure on R+ \u00d7 S, with mean\nmeasure \u03bd known as the L\u00e9vy intensity measure.\nWe will consider homogeneous CRMs, where\nlocations are independent, \u03bd(dw, d\u03b8) = \u03c1(dw)\u03ba\u03b1(d\u03b8), and assume \u03ba\u03b1 is the Lebesgue measure on\n[0, \u03b1].\nSince the construction as outlined in \ufb01gure 1c depends on sampling the edge start and end-points at\nrandom from the locations (\u03b8i)i, with probability proportional to wi, the normalized form of eqn. (5)\nwill be of particular interest. Speci\ufb01cally, the chance of selecting a particular location from a random\ndraw is governed by\n\nP =\n\n\u00b5\nT\n\n=\n\npi\u03b4\u03b8i,\n\npi =\n\n, T = \u00b5(S) =\n\nwi\nT\n\nwi,\n\n(6)\n\nwhich is known as the normalized random measure (NRM) and T is the total mass of the CRM\n\u00b5 (Kingman, 1967). A random draw from a Poisson process based on the CRM can thus be realized by\n\ufb01rst sampling the number of generated points, L \u223c Poisson(T ), and then drawing their locations in a\ni.i.d. manner from the NRM of eqn. (6). The reader is referred to James (2002) for a comprehensive\ntreatment on NRMs.\nWith the notation in place, we can provide the \ufb01nal form of the generative process for a network X\u03b1.\nSuppose the CRM \u00b5 (restricted to the region [0, \u03b1]) has been generated. Assume zi = (cid:96) iff. ui \u2208 J(cid:96)\nand de\ufb01ne the K thinned measures on [0, \u03b1] as:\n\n(cid:88)\n\ni:zi=(cid:96)\n\n\u00b5(cid:96) =\n\nwi\u03b4\u03b8i\n\neach with total mass T(cid:96) = \u00b5(cid:96)([0, \u03b1]). By basic properties of CRMs, the thinned measures are also\nCRMs (Pitman, 2006). The number of points in each tile L(cid:96)m is then Poisson(\u03b7(cid:96)mT(cid:96)Tm) distributed,\nand given L(cid:96)m the edge-endpoints (xe1(cid:96), xe2m) between atoms in measure (cid:96) and m can then be\ndrawn from the corresponding NRM. The generative process is then simply:\n\n(\u03b2(cid:96))K\n\n(cid:96)=1 \u223c Dirichlet(cid:0)\u03b20/K, . . . , \u03b20/K(cid:1)\n(cid:1)\niid\u223c Categorical(cid:0)(wi/T(cid:96))zi=(cid:96)\n\niid\u223c Gamma(\u03bba, \u03bbb)\n\n\u03b7(cid:96)k\n\nfor e = 1, . . . , L(cid:96)m: xe1(cid:96)\n\n\u00b5\n\nL(cid:96)m\n\nxe2m\n\niid\u223c CRM(\u03c1, U[0,1] \u00d7 UR+)\niid\u223c Poisson(\u03b7(cid:96)mT(cid:96)Tm)\n\niid\u223c Categorical(cid:0)wj/Tm)zj =m\n\n(cid:1).\n\n5\n\n K=1K=2K=3K=4k=188k=537k=689k=1961\fIn the following we will use the generalized gamma process (GGP) as the choice of L\u00e9vy intensity\nmeasure (James, 2002). The GGP is parameterized with two parameters \u03c3, \u03c4 and has the functional\nform\n\n\u03c1\u03c3,\u03c4 (dw) =\n\n1\n\n\u0393(1 \u2212 \u03c3)\n\nw\u22121\u2212\u03c3e\u2212\u03c4 wdw.\n\nThe parameters (\u03c3, \u03c4 ) will be restricted to lie in the region ]0, 1[\u00d7[0,\u221e[ as in (Caron & Fox, 2014).\nIn conjunction with \u03b1 we thus obtain three parameters (\u03b1, \u03c3, \u03c4 ) which fully describe the CRM and\nthe induced partition structure.\n\n2.4 Posterior distribution\n\nIn order to de\ufb01ne a sampling procedure of the CRMSBM we must \ufb01rst characterize the posterior\ndistribution. In Caron & Fox (2014) this was calculated using a specially tailored version of Palm\u2019s\nformula. In this work we will use a counting argument inspired by Pitman (2003, eqn. (32)) and\na reparameterization to collapse the weight-parameter (wi)i\u22651 to obtain a fairly simple analytical\nexpression which is amenable to standard sampling procedures. The full derivation is, however,\nsomewhat lengthy and is included in the supplementary material.\nFirst notice the distribution of the total mass T(cid:96) of each of the thinned random measures \u00b5(cid:96) is a\ntilted \u03c3-stable random variable (Pitman, 2006). If we introduce \u03b1(cid:96) \u2261 \u03b2(cid:96)\u03b1, its density g\u03b1(cid:96),\u03c3,\u03c4 may be\nwritten as\n\nwhere \u03c6\u03bb(t) = e\u03bb\u03c3\u2212\u03bbt, \u03bb = \u03c4 \u03b8 1\n\u03c3 and f\u03c3 is the density of a \u03c3-stable random variable. See\nDevroye & James (2014) for more details. According to Zolotarev\u2019s integral representation, the\nfunction f\u03c3 has the following form (Zolotarev, 1964)\n\ng\u03b1,\u03c3,\u03c4 (t) = \u03b8\u2212 1\n\u03c3 , \u03b8 = \u03b1\n\n\u03c3 f\u03c3(t\u03b8\u2212 1\n\n\u03c3 )\u03c6\u03bb(t\u03b8\u2212 1\n\u03c3 )\n\ndu A(\u03c3, u)e\n\n\u2212A(\u03c3,u)\nx\u03c3/(1\u2212\u03c3) , A(\u03c3, u) = sin((1\u2212\u03c3)u)\n\n(cid:20) sin(\u03c3u)\u03c3\n\n(cid:21) 1\n\n1\u2212\u03c3\n\nsin(u)\n\n.\n\n(7)\n\nf\u03c3(x) =\n\n\u22121\n1\u2212\u03c3\n\u03c3x\n\u03c0(1 \u2212 \u03c3)\n\n(cid:90) \u03c0\n\n0\n\n(cid:26) zi\n\nSince not all potential vertices (i.e.\nterms wi\u03b4\u03b8i in \u00b5) will have edges attached to them, it is\nuseful to introduce a variable which encapsulates this distinction. We therefore de\ufb01ne the variable\n\u02dczi = 0, 1, . . . , K with the de\ufb01nition:\n\nif there exists (x, y) \u2208 X\u03b1 s.t. \u03b8i \u2208 {x, y},\n\n\u02dczi =\n\n0 otherwise.\n\nnetwork. Next, we consider a speci\ufb01c network (Aij)k\n\nIn addition, suppose for each measure \u00b5(cid:96), the end-points of the edges associated with this measure\n(cid:96)=1 k(cid:96) is the total number of vertices in the\ni,j=1 and assume it is labelled such that atom\nj(Aij + Aji) as\ni:\u02dczi=(cid:96) ni as the aggregated edge-endpoints\n\u02dczi=(cid:96),\u02dczm=j Aij as the edges between measure \u00b5(cid:96) and \u00b5m. The\n\nselects k(cid:96) = |{i : \u02dczi = (cid:96)}| unique atoms and k =(cid:80)K\n(wi, \u03b8i) corresponds to a particular vertex i in the network. We also de\ufb01ne ni =(cid:80)\nthe number of edge-endpoints that selects atom i, n(cid:96) =(cid:80)\nthat select measure \u00b5(cid:96) and n(cid:96)m =(cid:80)\n\u0393(\u03b20)(cid:81)K\nK )K\u03b1\u03b20(cid:81)\n\nP (A, (zi)i, \u03c3, \u03c4, (\u03b1(cid:96), s(cid:96), t(cid:96))(cid:96)) =\n\nposterior distribution is then\n\nG(\u03bba +n(cid:96)m, \u03bbb +T(cid:96)Tm)\n\n\u03b20\n\nK \u22121\n\nE(cid:96)\nij Aij!\n\nG(\u03bba, \u03bbb)\n\n,\n\n(8)\n\n(cid:96)=1 \u03b1\n(cid:96)\n\n\u0393( \u03b20\n\nwhere we have introduced:\n\nand s(cid:96) =(cid:80)\n\nE(cid:96) =\n\n\u03b1k(cid:96) sn(cid:96)\u2212k(cid:96)\u03c3\u22121\n\u0393(n(cid:96) \u2212 k(cid:96)\u03c3)e\u03c4 s(cid:96)\n\n(cid:96)\n\ng\u03b1(cid:96),\u03c4,\u03c3(T(cid:96)\u2212s(cid:96))\n\n(1 \u2212 \u03c3)ni\n\ni:\u02dczi=(cid:96) wi is the mass of the \"occupied\" atoms in the measure \u00b5(cid:96). The posterior distribution\ncan be seen as the product of K partition functions corresponding to the GGP, multiplied by the K 2\ninteraction factors involving the function G(a, b) = \u0393(a)b\u2212a, and corresponding to the interaction\nbetween the measures according to the block structure assumption.\nNote that the \u03b7 = 1 case, corresponding to a collapsed version of Caron & Fox (2014), can be\nobtained by taking the limit \u03bba = \u03bbb \u2192 \u221e, in which case G(\u03bba+n,\u03bbb+T )\nG(\u03bba,\u03bbb) \u2192 e\u2212T . When discussing\nthe K = 1 case, we will assume this limit has been taken.\n\n(cid:89)\n(cid:89)\n\n(cid:96)m\n\n\u02dczi=(cid:96)\n\n6\n\n\f2.5\n\nInference\n\nSampling the expression eqn. (8) requires three types of sampling updates: (i) the sequence of\nblock-assignments (zi)i must be updated, (ii) in the simulations we will consider binary networks\nand we will therefore need to both impute the integer valued counts (if Aij > 0), as well as missing\nvalues in the network, and (iii) both the parameters associated with the random measure, \u03c3 and \u03c4, as\nwell as the remaining variables associated with each expression E(cid:96) must be updated.\nAll terms, except the densities g\u03b1,\u03c3,\u03c4 , are amenable to standard sampling techniques. We opted for\nthe approach of Lomel\u00ed et al. (2014), in which u in Zolotarev\u2019s integral representation (eqn. 7) is\nconsidered an auxiliary parameter. The full inference procedure can be found in the supplementary\nmaterial, however, the main steps are: 1\nUpdate of (zi)i: For each (cid:96), impute (wi)\u02dczi=(cid:96) once per sweep (see supplementary for details), and\nthen iterate over i and update each zi using a Gibbs sweep from the likelihood. The Gibbs\nsweep is no more costly than that of a standard SBM.\nUpdate of A: Impute (\u03b7(cid:96)m)(cid:96)m and (wi)i once per sweep (see supplementary for details), and then\nfor each (ij) such that the edge is either unobserved or must be imputed (Aij \u2265 1), generate\na candidate a \u223c Poisson(\u03b7(cid:96)mwiwj). Then, if the edge is unobserved, simply set Aij = a,\notherwise if the edge is observed and a = 0, reject the update.\nUpdate of \u03c3, \u03c4: For (cid:96) = 1, . . . , K, introduce u(cid:96) corresponding to u in Zolotarev\u2019s integral represen-\ntation (eqn. 7) and let t(cid:96) = T(cid:96) \u2212 s(cid:96). Update the four variables in \u03a6(cid:96) = (\u03b1(cid:96), u(cid:96), s(cid:96), t(cid:96)) and\n\u03c3, \u03c4 using random-walk Metropolis Hastings updates.\n\nIn terms of computational cost, the inference procedure is of the same order as the SBM albeit\nwith higher constants due to the overall complexity of the likelihood and because the parameters\n(\u03b1(cid:96), u(cid:96), s(cid:96), t(cid:96)) must be sampled for each CRM. In Caron & Fox (2014), the parameters (wi)i\u22651 were\nsampled using Hamiltonian Monte Carlo, whereas herein they are collapsed and re-imputed.\nThe parameters \u03a6(cid:96) and \u03c3, \u03c4 are important for determining the sparsity and power-law properties\nof the network model (Caron & Fox, 2014). To investigate convergence of the sampler for these\nparameters, we generated a single network problem using \u03b1 = 25, \u03c3 = 0.5, \u03c4 = 2 and evaluated 12\nsamplers with K = 1 on the problem. Autocorrelation plots (mean and standard deviation computed\nover 12 restarts) can be seen in \ufb01gure 4a. All parameters mix, however the different parameters\nhave different mixing times with u in particular being affected by excursions. This indicates many\nsampling updates of \u03a6(cid:96) are required to explore the state space suf\ufb01ciently and we therefore applied\n50 updates of \u03a6(cid:96) for each update of (zi)i and Aij. Additional validation of the sampling procedure\ncan be found in the supplementary material.\n\n3 Experiments\n\nThe proposed method was evaluated on 11 network datasets (a description of how the datasets were\nobtained and prepared can be found in the supplementary material) using K = 200 in the truncated\nstick-breaking representation. As a criteria of evaluation we choose AUC score on held-out edges, i.e.\npredicting the presence or absence of unobserved edges using the imputation method described in\nthe previous section. All networks were initially processed by thresholds at 0, and vertices with zero\nedges were removed. A fraction of 5% of the edges were removed and considered as held-out data.\nTo examine the effect of using blocks, we compared the method against the method of Caron &\nFox (2014) (CRM) (corresponding to \u03b7(cid:96)m = 1 and K = 1), a standard block-structured model with\nPoisson observations (pIRM) (Kemp et al., 2006), and the degree-corrected stochastic block model\n(DCSBM) Herlau et al. (2014). The later allows both block-structure and degree-heterogeneity but it\nis not exchangeable. More details on the simulations and methods are found in the supplementary\nmaterial.\nThe pIRM was selected since it is the closest block-structured model to the CRMSBM without\ndegree-correction. This allows us to determine the relative bene\ufb01t of inferring the degree-distribution\ncompared to only the block-structure. For the priors we selected uniform priors for \u03c3, \u03c4, \u03b1 and a\nGamma(2, 1) prior for \u03b20, \u03bba, \u03bbb. Similar choices were made for the other models.\n\n1Code available at http://people.compute.dtu.dk/tuhe/crmsbm.\n\n7\n\n\f(a) Autocorrelation plots\n\n(b) Link prediction\n\nFigure 4: (Left:) Autocorrelation plots of the parameters \u03b1, \u03c3, \u03c4, s, t and u for a K = 1 network\ndrawn from the prior distribution using \u03b1 = 25, \u03c3 = 0.5 and \u03c4 = 2. The plots were obtained\nby evaluating the proposed sampling procedure for 106 iterations and the shaded region indicates\nstandard deviation obtained over 12 re-runs. The simulation indicates reasonable mixing for all\nparameters, with u being the most affected by excursions. (Right:) AUC score on held-out edges\nfor the selected methods (averaged over 4 restarts) on 11 network datasets. For the same number of\nblocks, the CRMSBM offers good link-prediction performance compared to the method of Caron\n& Fox (2014) (CRM), a SBM with Poisson observations (pIRM) and the degree-corrected SBM\n(DCSBM) (Herlau et al., 2014). Additional information is found in the supplementary material.\n\nAll methods were evaluated for T = 2 000 iterations, and the latter half of the chains was used for\nlink prediction. We used 4 random selections of held-out edges per network to obtain the results\nseen in \ufb01gure 4b (same sets of held-out edges were used for all methods). It is evident that block-\nstructure is crucial to obtain good link prediction performance. For the block-structured methods,\nthe results indicate additional bene\ufb01ts from using models which permits degree-heterogenity upon\nmost networks, except the Hagmann brain connectivity graph. This result is possibly explained by\nthe Hagmann graph having little edge-inhomogeneity. Comparing the CRMSBM and the DCSBM,\nthese models perform either on par with or with a slight advantage to the CRMSBM.\n\n4 Discussion and Conclusion\n\nModels of networks based on the CRM representation of Kallenberg (2005) offer one of the most\nimportant new ideas in statistical modelling of networks in recent years. To our knowledge Caron\nand Fox (2014) were the \ufb01rst to realize the bene\ufb01ts of this modelling approach, describe its statistical\nproperties and provide an ef\ufb01cient sampling procedure.\nThe degree distribution of a network is only one of several important characteristics of a complex\nnetwork. In this work we have examined how the ideas presented in Caron and Fox (2014) can be\napplied for a simple block-structured network model to obtain a model which admits block structure\nand degree correction. Our approach is a fairly straightforward generalization of the methods of\nCaron and Fox (2014). However, we have opted to explicitly represent the density of the total mass\ng\u03b1(cid:96),\u03c3,\u03c4 and integrate out the sociability parameters (wi)i, thereby reducing the number of parameters\nassociated with the CRM from the order of vertices to the order of blocks.\nThe resulting model has the increased \ufb02exibility of being able to control the degree distribution within\neach block. In practice, results of the model on 11 real-world datasets indicate that this \ufb02exibility\noffers bene\ufb01ts over purely block-structured approaches to link prediction for most networks, as well as\npotential bene\ufb01ts over alternative approaches to modelling block-structure and degree-heterogeneity.\nThe results strongly indicate that structural assumptions (such as block-structure) are important to\nobtain reasonable link prediction.\nBlock-structured network modelling is in turn the simplest structural assumption for block-modelling.\nThe extension of the method of Caron and Fox (2014) to overlapping blocks, possibly using the de-\npendent random measures of Chen et al. (2013), appears fairly straightforward and should potentially\noffer a generalization of overlapping block models.\n\n8\n\n ust\u03c4\u03c3\u03b1LagAutocorrelation02004006008001000\u22120.0500.050.10.150.2 CRMpIRMDCSBMCRMSBMSwarthmoreSimmonsSmaGriSciMetNetsciHaverfordReedCaltechHagmannYeastNIPSAUCscoreofheld-outedges0.650.70.750.80.850.90.951\fAcknowledgments\n\nThis project was funded by the Lundbeck Foundation (grant nr. R105-9813).\n\nReferences\nAldous, David J. Representations for partially exchangeable arrays of random variables. Journal of Multivariate\n\nAnalysis, 11(4):581\u2013598, 1981.\n\nBarab\u00e1si, Albert-L\u00e1szl\u00f3. Emergence of Scaling in Random Networks. Science, 286(5439):509\u2013512, October\n\n1999. ISSN 00368075. doi: 10.1126/science.286.5439.509.\n\nCaron, Francois and Fox, Emily B. Bayesian nonparametric models of sparse and exchangeable random graphs.\n\narXiv preprint arXiv:1401.1137, 2014.\n\nChen, Changyou, Rao, Vinayak, Buntine, Wray, and Teh, Yee Whye. Dependent normalized random measures.\n\nIn Proceedings of The 30th International Conference on Machine Learning, pp. 969\u2013977, 2013.\n\nDevroye, Luc and James, Lancelot. On simulation and properties of the stable law. Statistical methods &\n\napplications, 23(3):307\u2013343, 2014.\n\nHerlau, Tue, Schmidt, Mikkel N, and M\u00f8rup, Morten. In\ufb01nite-degree-corrected stochastic block model. Phys.\n\nRev. E, 90:032819, Sep 2014. doi: 10.1103/PhysRevE.90.032819.\n\nHoover, Douglas N. Relations on probability spaces and arrays of random variables. Preprint, Institute for\n\nAdvanced Study, Princeton, NJ, 2, 1979.\n\nHougaard, Philip. Survival models for heterogeneous populations derived from stable distributions. Biometrika,\n\n73(2):387\u2013396, 1986.\n\nJames, Lancelot F. Poisson process partition calculus with applications to exchangeable models and Bayesian\n\nnonparametrics. arXiv preprint math/0205093, 2002.\n\nKallenberg, Olaf. Probabilistic Symmetries and Invariance Principles. Number v. 10 in Applied probability.\n\nSpringer, 2005. ISBN 9780387251158.\n\nKemp, Charles, Tenenbaum, Joshua B, Grif\ufb01ths, Thomas L, Yamada, Takeshi, and Ueda, Naonori. Learning\n\nsystems of concepts with an in\ufb01nite relational model. In AAAI, volume 3, pp. 5, 2006.\n\nKingman, John. Completely random measures. Paci\ufb01c Journal of Mathematics, 21(1):59\u201378, 1967.\n\nLomel\u00ed, Mar\u00eda, Favaro, Stefano, and Teh, Yee Whye. A marginal sampler for \u03c3-stable Poisson-Kingman mixture\n\nmodels. arXiv preprint arXiv:1407.4211, 2014.\n\nNewman, M. E. J., Strogatz, S. H., and Watts, D. J. Random graphs with arbitrary degree distributions and their\n\napplications. Physical Review E, 64(2), July 2001. ISSN 1063-651X.\n\nOrbanz, Peter and Roy, Daniel M. Bayesian models of graphs, arrays and other exchangeable random structures.\n\nPattern Analysis and Machine Intelligence, IEEE Transactions on, 37(2):437\u2013461, 2015.\n\nPitman, Jim. Poisson-Kingman partitions. Lecture Notes-Monograph Series, pp. 1\u201334, 2003.\n\nPitman, Jim. Combinatorial Stochastic Processes: Ecole D\u2019Et\u00e9 de Probabilit\u00e9s de Saint-Flour XXXII-2002.\n\nSpringer, 2006.\n\nStrogatz, Steven H. Exploring complex networks. Nature, 410(6825):268\u2013276, 2001.\n\nVeitch, Victor. and Roy, Daniel M. The Class of Random Graphs Arising from Exchangeable Random Measures.\n\nArXiv e-prints, December 2015.\n\nWhite, Harrison C, Boorman, Scott A, and Breiger, Ronald L. Social structure from multiple networks. i.\n\nblockmodels of roles and positions. American journal of sociology, pp. 730\u2013780, 1976.\n\nXu, Zhao, Tresp, Volker, Yu, Kai, and Kriegel, Hans-Peter. In\ufb01nite hidden relational models. In Proceedings of\n\nthe 22nd International Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI 2006), 2006.\n\nZolotarev, Vladimir Mikhailovich. On the representation of stable laws by integrals. Trudy Matematicheskogo\n\nInstituta im. VA Steklova, 71:46\u201350, 1964.\n\n9\n\n\f", "award": [], "sourceid": 2117, "authors": [{"given_name": "Tue", "family_name": "Herlau", "institution": "Technical University of Denmark"}, {"given_name": "Mikkel", "family_name": "Schmidt", "institution": "DTU"}, {"given_name": "Morten", "family_name": "M\u00f8rup", "institution": "Technical University of Denmark"}]}