{"title": "Bayesian nonparametric models for bipartite graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 2051, "page_last": 2059, "abstract": "We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, an Indian Buffet-like generative process for network growth, and a simple and efficient Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks.", "full_text": "Bayesian nonparametric models for bipartite graphs\n\nFranc\u00b8ois Caron\n\nINRIA\n\nIMB - University of Bordeaux\n\nTalence, France\n\nFrancois.Caron@inria.fr\n\nAbstract\n\nWe develop a novel Bayesian nonparametric model for random bipartite graphs.\nThe model is based on the theory of completely random measures and is able\nto handle a potentially in\ufb01nite number of nodes. We show that the model has\nappealing properties and in particular it may exhibit a power-law behavior. We\nderive a posterior characterization, a generative process for network growth, and\na simple Gibbs sampler for posterior simulation. Our model is shown to be well\n\ufb01tted to several real-world social networks.\n\n1\n\nIntroduction\n\nThe last few years have seen a tremendous interest in the study, understanding and statistical mod-\neling of complex networks [14, 6]. A network is a set if items, called vertices, with connections\nbetween them, called edges. In this article, we shall focus on bipartite networks, also known as two-\nmode, af\ufb01liation or collaboration networks [16, 17]. In bipartite networks, items are divided into two\ndifferent types A and B, and only connections between items of different types are allowed. Exam-\nples of this kind can be found in movie actors co-starring the same movie, scientists co-authoring a\nscienti\ufb01c paper, internet users posting a message on the same forum, people reading the same book\nor listening to the same song, members of the boards of company directors sitting on the same board,\netc. Following the readers-books example, we will refer to items of type A as readers and items of\ntype B as books. An example of bipartite graph is shown on Figure 1(b). An important summariz-\ning quantity of a bipartite graph is the degree distribution of readers (resp. books) [14]. The degree\nof a vertex in a network is the number of edges connected to that vertex. Degree distributions of\nreal-world networks are often strongly non-Poissonian and exhibit a power-law behavior [15].\nA bipartite graph can be represented by a set of binary variables (zij) where zij = 1 if reader i has\nread book j, 0 otherwise. In many situations, the number of available books may be very large and\npotentially unknown. In this case, a Bayesian nonparametric (BNP) approach can be sensible, by\nassuming that the pool of books is in\ufb01nite. To formalize this framework, it will then be convenient\nto represent the bipartite graph by a collection of atomic measures Zi, i = 1, . . . , n with\n\nZi =\n\nzij\u03b4\u03b8j\n\n(1)\n\nj=1\n\nwhere {\u03b8j} is the set of books and typically Zi only has a \ufb01nite set of non-zero zij corresponding to\nbooks reader i has read. Grif\ufb01ths and Ghahramani [8, 9] have proposed a BNP model for such binary\nrandom measures. The so-called Indian Buffet Process (IBP) is a simple generative process for the\nconditional distribution of Zi given Z1, . . . , Zi\u22121. Such process can be constructed by considering\nthat the binary measures Zi are i.i.d. from some random measure drawn from a beta process [19, 10].\nIt has found several applications for inferring hidden causes [20], choices [7] or features [5]. Teh\nand Gor\u00a8ur [18] proposed a three-parameter extension of the IBP, named stable IBP, that enables to\n\n1\n\n\u221e(cid:88)\n\n\fmodel a power-law behavior for the degree distribution of books. Although more \ufb02exible, the stable\nIBP still induces a Poissonian distribution for the degree of readers.\nIn this paper, we propose a novel Bayesian nonparametric model for bipartite graphs that addresses\nsome of the limitations of the stable IBP, while retaining computational tractability. We assume\nthat each book j is assigned a positive popularity parameter wj > 0. This parameter measures the\npopularity of the book, larger weights indicating larger probability to be read. Similarly, each reader\ni is assigned a positive parameter \u03b3i which represents its ability to read books. The higher \u03b3i, the\nmore books the reader i is willing to read. Given the weights wj and \u03b3i, reader i reads book j with\nprobability 1 \u2212 exp(\u2212\u03b3iwj). We will consider that the weights wj and/or \u03b3i are the points of a\nPoisson process with a given L\u00b4evy measure. We show that depending on the choice of the L\u00b4evy\nmeasure, a power-law behavior can be obtained for the degree distribution of books and/or readers.\nMoreover, using a set of suitably chosen latent variables, we can derive a generative process for\nnetwork growth, and an ef\ufb01cient Gibbs sampler for approximate inference. We provide illustrations\nof the \ufb01t of the proposed model on several real-world bipartite social networks. Finally, we discuss\nsome potentially useful extensions of our work, in particular to latent factor models.\n\n2 Statistical Model\n\n2.1 Completely Random Measures\n\nWe \ufb01rst provide a brief overview of completely random measures (CRM) [12, 13] before describing\nthe BNP model for bipartite graphs in Section 2.2. Let \u0398 be a measurable space. A CRM is a\nrandom measure G such that for any collection of disjoint measurable subsets A1, . . . , An of \u0398,\nthe random masses of the subsets G(A1), . . . , G(An) are independent. CRM can be decomposed\ninto a sum of three independent parts: a non-random measure, a countable collection of atoms with\n\ufb01xed locations, and a countable collection of atoms with randoms masses at random locations. In this\npaper, we will be concerned with models de\ufb01ned by CRMs with random masses at random locations,\nj=1 wj\u03b4\u03b8j . The law of G can be characterized in terms of a Poisson process over the point\nset {(wj, \u03b8j), j = 1, . . . ,\u221e} \u2282 R+ \u00d7 \u0398. The mean measure \u039b of this Poisson process is known as\nthe L\u00b4evy measure. We will assume in the following that the L\u00b4evy measure decomposes as a product\nof two non-atomic densities, i.e. that G is a homogeneous CRM \u039b(dw, d\u03b8) = \u03bb(w)h(\u03b8)dwd\u03b8 with\n\u0398 h(\u03b8)d\u03b8 = 1. It implies that the locations of the atoms in G are independent\nof the masses, and are i.i.d. from h, while the masses are distributed according to a Poisson process\nj=1 wj is\npositive and \ufb01nite with probability one, which is guaranteed if the following conditions are satis\ufb01ed\n\ni.e. G =(cid:80)\u221e\nh : \u0398 \u2192 [0, +\u221e) and(cid:82)\nover R+ with mean intensity \u03bb. We will further assume that the total mass G(\u0398) = (cid:80)\u221e\n(cid:90) \u221e\n\n(cid:90) \u221e\n\n\u03bb(w)dw = \u221e and\n\n(1 \u2212 exp(\u2212w))\u03bb(w)dw < \u221e\n\n0\n\n0\n\nand note g(x) its probability density function evaluated at x. We will refer to \u03bb as the L\u00b4evy intensity\nin the following, and to h as the base density of G, and write G \u223c CRM(\u03bb, h). We will also note\n\n(2)\n\n(3)\n\n(4)\n\n(5)\n\n(cid:90) \u221e\n\n\u03c8\u03bb(t) = \u2212 log E [exp(\u2212tG(\u0398))] =\n\n(1 \u2212 exp(\u2212tw))\u03bb(w)dw\n\n(cid:101)\u03c8\u03bb(t, b) =\n\n\u03ba(n, z) =\n\n(cid:90) \u221e\n(cid:90) \u221e\n\n0\n\n0\n\n(1 \u2212 exp(\u2212tw))\u03bb(w) exp(\u2212bw)dw\n\n0\n\n\u03bb(w)wne\u2212zwdw\n\nAs a notable particular example of CRM, we can mention the generalized gamma process (GGP) [1],\nwhose L\u00b4evy intensity is given by\n\n\u03bb(w) =\n\n\u03b1\n\n\u0393(1 \u2212 \u03c3)\n\nw\u2212\u03c3\u22121e\u2212w\u03c4\n\nGGP encompasses the gamma process (\u03c3 = 0), the inverse Gaussian process (\u03c3 = 0.5) and the\nstable process (\u03c4 = 0) as special cases. Table ?? in supplementary material provides the expressions\nof \u03bb, \u03c8 and \u03ba for these processes.\n\n2\n\n\f2.2 A Bayesian nonparametric model for bipartite graphs\nLet G \u223c CRM(\u03bb, h) where \u03bb satis\ufb01es conditions (2). A draw G takes the form\n\n\u221e(cid:88)\n\nj=1\n\n(cid:26) zij = 1 if vij < 1\n\nzij = 0 otherwise\n\n\u221e(cid:88)\n\n\u221e(cid:88)\n\n\u221e(cid:88)\n\n(6)\nwhere {\u03b8j} is the set of books and {wj} the set of popularity parameters of books. For i = 1, . . . , n,\nlet consider the latent exponential process\n\nwj\u03b4\u03b8j\n\nG =\n\nj=1\n\n(7)\nde\ufb01ned for j = 1, . . . ,\u221e by vij|wj \u223c Exp(wj\u03b3i) where Exp(a) denotes the exponential distri-\nbution of rate a. The higher wj and/or \u03b3i, the lower vij. We then de\ufb01ne the binary process Zi\nconditionally on Vi by\n\nvij\u03b4\u03b8j\n\nVi =\n\nj=1\n\nZi =\n\nzij\u03b4\u03b8j with\n\n(8)\n\nBy integrating out the latent variables vij we clearly have p(zij = 1|wj, \u03b3i) = 1 \u2212 exp(\u2212\u03b3iwj).\nProposition 1 Zi is marginally characterized by a Poisson process over the point set {(\u03b8\u2217\n\n1, . . . ,\u221e} \u2282 \u0398, of intensity measure \u03c8\u03bb(\u03b3i)h(\u03b8\u2217). Hence, the total mass Zi(\u0398) = (cid:80)\u221e\n\nj ), j =\nj=1 zij,\nwhich corresponds to the total number of books read by reader i is \ufb01nite with probability one and\nadmits a Poisson(\u03c8\u03bb(\u03b3i)) distribution, where \u03c8\u03bb(z) is de\ufb01ned in Equation (3), while the locations\n\u03b8\u2217\nj are i.i.d. from h.\nThe proof, which makes use of Campbell\u2019s theorem for point processes [13] is given in supplemen-\n\ntary material. As an example, for the gamma process we have Zi(\u0398) \u223c Poisson(cid:0)\u03b1 log(cid:0)1 + \u03b3i\n\n(cid:1)(cid:1).\n\n\u03c4\n\nIt will be useful in the following to introduce a censored version of the latent process Vi, de\ufb01ned by\n\n(9)\nwhere uij = min(vij, 1), for i = 1, . . . , n and j = 1, . . . ,\u221e. Note that Zi can be obtained\ndeterministically from Ui.\n\nuij\u03b4\u03b8j\n\nUi =\n\nj=1\n\n2.3 Characterization of the conditional distributions\n\nThe conditional distribution of G given Z1, . . . , Zn cannot be obtained in closed form1. We will\nmake use of the latent process Ui. In this section, we derive the formula for the conditional laws\nP (U1, . . . , Un|G), P (U1, . . . , Un) and P (G|U1, . . . , Un) . Based on these results, we derive in Sec-\ntion 2.4 a generative process and in Section 2.5 a Gibbs sampler for our model, that both rely on the\nintroduction of these latent variables.\n\nj=1 zij the degree\ni=1 zij the degree of\nbook j (number of people having read book j). The conditional likelihood of U1, . . . Un given G is\ngiven by\n\nAssume that K books {\u03b81, . . . , \u03b8K} have appeared. We write Ki = Zi(\u0398) =(cid:80)\u221e\ni=1 Zi({\u03b8j}) =(cid:80)n\nof reader i (number of books read by reader i) and mj =(cid:80)n\n\uf8f9\uf8fb exp (\u2212\u03b3iG(\u0398\\{\u03b81, . . . , \u03b8K}))\n(cid:33)\uf8f9\uf8fb exp\n(cid:33)\n(cid:32)\n\nn(cid:89)\n(cid:33)\uf8ee\uf8f0 K(cid:89)\n\n\uf8ee\uf8f0 K(cid:89)\n(cid:32)\n\nP (U1, . . . Un|G) =\n\nexp (\u2212\u03b3iwjuij)\n\n(cid:32) n(cid:88)\n\n(cid:32) n(cid:89)\n\n\u03b3i(uij \u2212 1)\n\n\uf8f1\uf8f2\uf8f3\n\n\uf8fc\uf8fd\uf8fe\n\nn(cid:88)\n\n\u03b3zij\ni wzij\n\n\u2212wj\n\n(cid:33)\n\nG(\u0398)\n\ni=1\n\nj=1\n\nwmj\n\nj\n\nexp\n\n\u03b3Ki\ni\n\n(10)\n\n=\n\n\u2212\n\n\u03b3i\n\nj\n\ni=1\n\nj=1\n\ni=1\n\ni=1\n\n1In the case where \u03b3i = \u03b3, it is possible to derive P (Z1, . . . , Zn) and P (Zn+1|Z1, . . . , Zn) where the\nrandom measure G and the latent variables U are marginalized out. This particular case is described in supple-\nmentary material.\n\n3\n\n\fProposition 2 The marginal distribution P (U1, . . . Un) is given by\n\n(cid:32) n(cid:89)\n\n(cid:33)\n\n(cid:34)\n\n\u03b3Ki\ni\n\nexp\n\n\u2212\u03c8\u03bb\n\n(cid:32) n(cid:88)\n\n(cid:33)(cid:35) K(cid:89)\n\ni=1\n\ni=1\n\nj=1\n\n(cid:32)\n\n(cid:33)\n\n\u03b3iuij\n\n(11)\n\nn(cid:88)\n\ni=1\n\nP (U1, . . . Un) =\n\n\u03b3i\n\nh(\u03b8j)\u03ba\n\nmj,\n\nwhere \u03c8\u03bb and \u03ba are resp. de\ufb01ned by Eq. (3) and (5).\nProof. The proof, detailed in supplementary material, is obtained by an application of the Palm\nformula for CRMs [3, 11], and is the same as that of Theorem 1 in [2].\n\nProposition 3 The conditional distribution of G given the latent processes U1, . . . Un can be ex-\npressed as\n\nG = G\u2217 +\nwhere G\u2217 and (wj) are mutually independent with\n\nK(cid:88)\n\nj=1\n\nwj\u03b4\u03b8j\n\nG\u2217 \u223c CRM(\u03bb\u2217, h)\n\nand the masses are\n\nP (wj|rest) =\n\n\u03bb\u2217(w) = \u03bb(w) exp\n\n(cid:80)n\n\nexp (\u2212wj\n\n\u03ba(mj,(cid:80)n\n\nj\n\ni=1 \u03b3iuij)\n\ni=1 \u03b3iUij)\n\n\u03bb(wj)wmj\n\n(cid:32)\n\n\u2212w\n\n(cid:33)\n\nn(cid:88)\n\ni=1\n\n\u03b3i\n\n(12)\n\n(13)\n\n(14)\n\nProof. The proof, based on the application of the Palm formula and detailed in supplementary\nmaterial, is the same as that of Theorem 2 in [2].\n\nIn the case of the GGP, G\u2217 is still a GGP of parameters (\u03b1\u2217 = \u03b1, \u03c3\u2217 = \u03c3, \u03c4\u2217 = \u03c4 +(cid:80)n\n\ni=1 \u03b3i), while\n\nthe wj\u2019s are conditionally gamma distributed, i.e.\n\nwj|rest \u223c Gamma\n\nmj \u2212 \u03c3, \u03c4 +\n\n\u03b3iuij\n\n(cid:33)\n\nn(cid:88)\n\ni=1\n\n(cid:32)\n\nK(cid:88)\n\nCorollary 4 The predictive distribution of Zn+1 given the latent processes U1, . . . , Un is given by\n\nZn+1 = Z\u2217\n\nn+1 +\n\nzn+1,j\u03b4\u03b8j\n\nj=1\n\n(cid:18)\n\nn+1 with\n\n1 \u2212 \u03ba(mj, \u03c4 + \u03b3n+1 +(cid:80)n\n\u03ba(mj, \u03c4 +(cid:80)n\n\ni=1 \u03b3iuij)\n\ni=1 \u03b3iuij)\n\n(cid:19)\n\nwhere the zn+1,j are independent of Z\u2217\n\nzn+1,j|U \u223c Ber\n\nwhere Ber is the Bernoulli distribution and Z\u2217\nintensity measure \u03c8\u03bb\u2217 (\u03b3n+1) h(\u03b8).\n\nFor the GGP, we have\n\nn+1(\u0398) \u223c\nZ\u2217\n\nand\n\nzn+1,j|U \u223c Ber\n\n1 \u2212\n\n1 +\n\nn+1 is a homogeneous Poisson process over \u0398 of\n\n\uf8f1\uf8f2\uf8f3 Poisson\n(cid:16) \u03b1\n(cid:16)\n(cid:32)\n(cid:18)\n\nPoisson\n\n\u03c3\n\n(cid:17)\u03c3 \u2212 (\u03c4 +(cid:80)n\n(cid:17)(cid:17)\n(cid:19)\u2212mj +\u03c3(cid:33)\n\ni=1 \u03b3i\n\n(cid:104)(cid:16)\n\u03c4 +(cid:80)n+1\n(cid:16)\n\u03c4 +(cid:80)n\n\u03c4 +(cid:80)n\n\ni=1 \u03b3i\n1 + \u03b3n+1\n\n\u03b3n+1\n\ni=1 \u03b3iuij\n\n\u03b1 log\n\ni=1 \u03b3i)\u03c3(cid:105)(cid:17)\n\nif \u03c3 (cid:54)= 0\nif \u03c3 = 0\n\n.\n\nn(cid:88)\n\nFinally, we consider the distribution of un+1,j|zn+1,j = 1, u1:n,j. This is given by\n\np(un+1,j|zn+1,j = 1, u1:n,j) \u221d \u03ba(mj + 1, un+1,j\u03b3n+1 +\n\n\u03b3iuij)1un+1,j\u2208[0,1]\n\n(15)\n\nIn supplementary material, we show how to sample from this distribution by the inverse cdf method\nfor the GGP.\n\ni=1\n\n4\n\n\fReader 1\n\nReader 2\n\nReader 3\n\n18\n4\n0\n12\n16 10\n\n14\n8\n0\n\nBooks\n\n13\n0\n\n4\n14\n\n9\n\n6\n\n(a)\n\n...\n...\n...\n\nA1\n\nA2\n\nA3\n\nB1\n\nB2\n\nB3\n\nB4\n\nB5\n\nB6\n\nB7\n\n(b)\n\nFigure 1: Illustration of the generative process described in Section 2.4.\n\n2.4 A generative process\n\nIn this section we describe the generative process for Zi given (U1, . . . , Ui\u22121), G being integrated\nout. This reinforcement process, where popular books will be more likely to be picked, is remi-\nniscent of the generative process for the beta-Bernoulli process, popularized under the name of the\nIndian buffet process [8]. Let xij = \u2212 log(uij) \u2265 0 be latent positive scores.\nConsider a set of n readers who successively enter into a library with an in\ufb01nite number of books.\nEach reader i = 1, . . . n, has some interest in reading quanti\ufb01ed by a positive parameter \u03b3i > 0.\nThe \ufb01rst reader picks a number K1 \u223c Poisson(\u03c8\u03bb(\u03b31)) books. Then he assigns a positive score\nx1j = \u2212 log(u1j) > 0 to each of these books, where u1j is drawn from distribution (15).\nNow consider that reader i enters into the library, and knows about the books read by previous\nreaders and their scores. Let K be the total number of books chosen by the previous i \u2212 1 readers,\nand mj the number of times each of the K books has been read. Then for each book j = 1, . . . , K,\nreader i will choose this book with probability\n\nand then will choose an additional number of K +\n\n1 \u2212 \u03ba(mj, \u03c4 + \u03b3i +(cid:80)i\u22121\n\u03ba(mj, \u03c4 +(cid:80)i\u22121\n(cid:32)(cid:101)\u03c8\u03bb\n(cid:32)\ni\u22121(cid:88)\n\ni \u223c Poisson\nK +\n\nk=1 \u03b3kukj)\ni books where\n\n\u03b3i,\n\n\u03b3k\n\nk=1 \u03b3kukj)\n\n(cid:33)(cid:33)\n\nReader i will then assign a score xij = \u2212 log uij > 0 to each book j he has read, where uij is drawn\nfrom (15). Otherwise he will set the default score xij = 0. This generative process is illustrated in\nFigure 1 together with the underlying bipartite graph . In Figure 2 are represented draws from this\ngenerative process with a GGP with parameters \u03b3i = 2 for all i, \u03c4 = 1, and different values for \u03b1\nand \u03c3.\n\nk=1\n\n2.5 Gibbs sampling\n\nFrom the results derived in Proposition 3, a Gibbs sampler can be easily derived to approximate\nthe posterior distribution P (G, U|Z). The sampler successively updates U given (w, G\u2217(\u0398)) then\n(w, G\u2217(\u0398)) given U. We present here the conditional distributions in the GGP case. For i =\n1, . . . , n, j = 1, . . . , K, set uij = 1 if zij = 0, otherwise sample\nuij|zij, wj, \u03b3i \u223c rExp(\u03b3iwj, 1)\n\nwhere rExp(\u03bb, a) is the right-truncated exponential distribution of pdf \u03bb exp(\u2212\u03bbx)/(1 \u2212\nexp(\u2212\u03bba))1x\u2208[0,a] from which we can sample exactly. For j = 1, . . . , K, sample\n\n(cid:33)\nand the total mass G\u2217(\u0398) follows a distribution g\u2217(w) \u221d g(w) exp (\u2212w(cid:80)n\nupdate G\u2217(\u0398) \u223c Gamma (\u03b1, \u03c4 +(cid:80)n\n\ni=1 \u03b3i) where g(w) is\nthe distribution of G(\u0398). In the case of the GGP, g\u2217(w) is an exponentially tilted stable distribution\nfor which exact samplers exist [4]. In the particular case of the gamma process, we have the simple\n\nwj|U, \u03b3i \u223c Gamma\n\nmj \u2212 \u03c3, \u03c4 +\n\n(cid:32)\n\nn(cid:88)\n\ni=1\n\n\u03b3iuij\n\ni=1 \u03b3i) .\n\n5\n\n\f(a) \u03b1 = 1, \u03c3 = 0\n\n(b) \u03b1 = 5, \u03c3 = 0\n\n(c) \u03b1 = 10, \u03c3 = 0\n\n(d) \u03b1 = 2, \u03c3 = 0.1\n\n(e) \u03b1 = 2, \u03c3 = 0.5\n\n(f) \u03b1 = 2, \u03c3 = 0.9\n\nFigure 2: Realisations from the generative process of Section 2.4 with a GGP of parameters \u03b3 = 2,\n\u03c4 = 1 and various values of \u03b1 and \u03c3.\n\n3 Update of \u03b3i and other hyperparameters\n\nWe may also consider the weight parameters \u03b3i to be unknown and estimate them from the graph.\nWe can assign a gamma prior \u03b3i \u223c Gamma(a\u03b3, b\u03b3) with parameters (a\u03b3 > 0, b\u03b3 > 0) and update\nit conditionally on other variables with\n\n\uf8eb\uf8eda\u03b3 +\n\nK(cid:88)\n\nK(cid:88)\n\n\u03b3i|G, U \u223c Gamma\n\nzij, b\u03b3 +\n\nwjuij + G\u2217(\u0398)\n\nj=1\n\nj=1\n\n\uf8f6\uf8f8\n\ni=1 \u03b3i\u03b4(cid:101)\u03b8i\n\ni=1 \u03b3i\u03b4(cid:101)\u03b8i\n\n\u03b3i|G, U \u223c Gamma\n\nIn this case, the marginal distribution of Zi(\u0398), hence the degree distribution of books, follows a\ncontinuous mixture of Poisson distributions, which offers more \ufb02exibility in the modelling.\nWe may also go a step further and consider that there is an in\ufb01nite number of readers with weights \u03b3i\n\n. This provides a lot of \ufb02exibility in the modelling of the distribution of the degree\nof readers, allowing in particular to obtain a power-law behavior, as shown in Section 5. We focus\nhere on the case where \u0393 is drawn from a generalized gamma process of parameters (\u03b1\u03b3, \u03c3\u03b3, \u03c4\u03b3) for\nwhere for i = 1, . . . , n,\n\nassociated to a given CRM \u0393 \u223c CRM(\u03bb\u03b3, h\u03b3) and a measurable space of readers(cid:101)\u0398. We then have\n\u0393 =(cid:80)\u221e\nsimplicity. Conditionally on (w, G\u2217(\u0398), U ), we have \u0393 = \u0393\u2217 +(cid:80)n\n\uf8f6\uf8f8\nK(cid:88)\n(cid:17)(cid:17)\n(cid:16)(cid:80)K\nupdate for (w, G\u2217) conditional on (U, \u03b3, \u0393((cid:101)\u0398)) is now for j = 1, . . . , K\nj=1 wj + G\u2217(\u0398)\n(cid:33)\nn(cid:88)\n\u03b3iuij + \u0393\u2217((cid:101)\u0398)\n(cid:16)(cid:80)n\n(cid:16)\u2212w\ni=1 \u03b3i + \u0393\u2217((cid:101)\u0398)\n\n\uf8eb\uf8ed K(cid:88)\n(cid:32)\n\nwj|U, \u0393 \u223c Gamma\n\nand G\u2217 \u223c CRM(\u03bb\u2217, h) with \u03bb\u2217(w) = \u03bb(w) exp\n. Note that\nthere is now symmetry in the treatment of books/readers.\nFor the scale parameter \u03b1 of\nthe GGP, we can assign a gamma prior \u03b1 \u223c Gamma(a\u03b1, b\u03b1) and update it with \u03b1|\u03b3 \u223c\n. Other parameters of the GGP can be updated\nGamma\nusing a Metropolis-Hastings step.\n\n(cid:16)(cid:80)n\n(cid:17)(cid:17)\ni=1 \u03b3i + \u0393\u2217((cid:101)\u0398)\n\na\u03b1 + K, b\u03b1 + \u03c8\u03bb\n\n(cid:16)\n\nand \u0393\u2217 \u223c CRM(\u03bb\u2217\n\n\u03b3, h\u03b3) with \u03bb\u2217\n\nzij \u2212 \u03c3\u03b3, \u03c4 +\n\nwjuij + G\u2217(\u0398)\n\nmj \u2212 \u03c3, \u03c4 +\n\n\u03b3(\u03b3) = \u03bb\u03b3(\u03b3) exp\n\n(cid:16)\u2212\u03b3\n\ni=1\n\n. In this case, the\n\nj=1\n\nj=1\n\n(cid:17)(cid:17)\n\n6\n\nBooksReaders2040608051015202530BooksReaders2040608051015202530BooksReaders2040608051015202530BooksReaders2040608051015202530BooksReaders2040608051015202530BooksReaders2040608051015202530\f4 Discussion\n\nPower-law behavior. We now discuss some of the properties of the model, in the case of the\nGGP. The total number of books read by n readers is O(n\u03c3). Moreover, for \u03c3 > 0, the degree\ndistribution follows a power-law distribution: asymptotically, the proportion of books read by m\nreaders is O(m\u22121\u2212\u03c3) (details in supplementary material). These results are similar to those of the\nstable IBP [18]. However, in our case, a similar behavior can be obtained for the degree distribution\nof readers when assigning a GGP to it, while it will always be Poisson for the stable IBP.\nConnection to IBP. The stable beta process [18] is a particular case of our construction, obtained\nby setting weights \u03b3i = \u03b3 and L\u00b4evy measure\n\n\u03bb(w) = \u03b1\n\n\u0393(1 + c)\n\n\u0393(1 \u2212 \u03c3)\u0393(c + \u03c3)\n\n\u03b3(1 \u2212 e\u2212\u03b3w)\u2212\u03c3\u22121e\u2212\u03b3w(c+\u03c3)\n\n(16)\n\nThe proof is obtained by a change of variable from the L\u00b4evy measure of the stable beta process.\nExtensions to latent factor models. So far, we have assumed that the binary matrix Z was observed.\nThe proposed model can also be used as a prior for latent factor models, similarly to the IBP. As\nan example of the potential usefulness of our model compared to IBP, consider the extraction of\nfeatures from time series of different lengths. Longer time series are more likely to exhibit more\nfeatures than shorter ones, and it is sensible in this case to assume different weights \u03b3i. In a more\ngeneral setting, we may want \u03b3i to depend on a set of metadata associated to reader i. Inference for\nlatent factor models is described in supplementary material.\n\n5\n\nIllustrations on real-world social networks\n\nWe now consider estimating the parameters of our model and evaluating its predictive performance\non six bipartite social networks of various sizes. We \ufb01rst provide a short description of these net-\nworks. The dataset \u2018Boards\u2019 contains information about members of the boards of Norwegian com-\npanies sitting at the same board in August 20112. \u2018Forum\u2019 is a forum network about web users\ncontributing to the same forums3. \u2018Books\u2019 concerns data collected from the Book-Crossing com-\nmunity about users providing ratings on books4 where we extracted the bipartite network from the\nratings. \u2018Citations\u2019 is the co-authorship network based on preprints posted to Condensed Matter\nsection of ArXiv between 1995 and 1999 [15]. \u2018Movielens100k\u2019 contains information about users\nrating particular movies5 from which we extracted the bipartite network. Finally, \u2018IMDB\u2019 contains\ninformation about actors co-starring a movie6. The sizes of the different networks are given in\nTable 1.\n\nDataset\nBoard\n\nn\n355\n\n899\n5064\n16726\n943\n28088\n\nK\n5766\n\n552\n36275\n22016\n1682\n178074\n\nGGP\n-68.6\n(31.9)\n-5.6e3\n4.4e4\n-3.4e4\n-5.5e4\n-1.1e5\nTable 1: Size of the different datasets and test log-likelihood of four different models.\n\nForum\nBooks\nCitations\nMovielens100k\nIMDB\n\nIG\n-145.1\n(81.9)\n-5.5e3\n4.6e4\n-3.1e4\n-5.5e4\n-1.1e5\n\nEdges\n1746\n\n7089\n49997\n58595\n100000\n341313\n\nS-IBP\n9.82\n(29.8)\n-6.7e3\n83.1\n-3.7e4\n-6.7e4\n-1.5e5\n\nSG\n8.3\n(30.8)\n-6.7e3\n214\n-3.7e4\n-6.7e4\n-1.5e5\n\nWe evaluate the \ufb01t of four different models on these datasets. First, the stable IBP [18] with param-\neters (\u03b1IBP , \u03c4IBP , \u03c3IBP ) (S-IBP). Second, our model where the parameter \u03b3 is the same over dif-\nferent readers, and is assigned a \ufb02at prior (SG). Third our model where each \u03b3i \u223c Gamma(a\u03b3, b\u03b3)\nwhere (a\u03b3, b\u03b3) are unknown parameters with \ufb02at improper prior (IG). Finally, our model with a\nGGP model for \u03b3i, with parameters (\u03b1\u03b3, \u03c3\u03b3, \u03c4\u03b3) (GGP). We divide each dataset between a training\n\n2Data can be downloaded from http://www.boardsandgender.com/data.php\n3Data for the forum and citation datasets can be downloaded from http://toreopsahl.com/datasets/\n4http://www.informatik.uni-freiburg.de/ cziegler/BX/\n5The dataset can be downloaded from http://www.grouplens.org\n6The dataset can be downloaded from http://www.cise.u\ufb02.edu/research/sparse/matrices/Pajek/IMDB.html\n\n7\n\n\f(a) S-IBP\n\n(b) GS\n\n(c) IG\n\n(d) GGP\n\n(e) S-IBP\n\n(f) GS\n\n(g) IG\n\n(h) GGP\n\nFigure 3: Degree distributions for movies (a-d) and actors (e-h) for the IMDB movie-actor dataset\nwith four different models. Data are represented by red plus and samples from the model by blue\ncrosses.\n\n(a) S-IBP\n\n(b) GS\n\n(c) IG\n\n(d) GGP\n\n(e) S-IBP\n\n(f) GS\n\n(g) IG\n\n(h) GGP\n\nFigure 4: Degree distributions for readers (a-d) and books (e-h) for the BX books dataset with four\ndifferent models. Data are represented by red plus and samples from the model by blue crosses.\n\n\u03b3 = (cid:98)\u03b1\u03b3/3 to take into account the different sample\n\nset containing 3/4 of the readers and a test set with the remaining. For each model, we approximate\nthe posterior mean of the unknown parameters (respectively (\u03b1IBP , \u03c4IBP , \u03c3IBP ), \u03b3, (a\u03b3, b\u03b3) and\n(\u03b1\u03b3, \u03c3\u03b3, \u03c4\u03b3) for S-IBP, SG, IG and GGP) given the training network with a Gibbs sampler with\n10000 burn-in iterations then 10000 samples; then we evaluate the log-likelihood of the estimated\nmodel on the test data. For GGP, we use \u03b1test\nsizes. For \u2018Boards\u2019, we do 10 replications with random permutations given the small sample size\nand report standard deviation together with mean value. Table 1 shows the results over the different\nnetworks for the different models. Typically, S-IBP and SG give very similar results. This is not\nsurprising, as they share the same properties, i.e. Poissonian degree distribution for readers and\npower-law degree distribution for books. Both methods perform better solely on the Board dataset,\nwhere the Poisson assumption on the number of people sitting on the same board makes sense. On\nall the other datasets, IG and GGP perform better and similarly, with slightly better performances for\nIG. These two models are better able to capture the power-law distribution of the degrees of readers.\nThese properties are shown on Figures 3 and 4 which resp. give the empirical degree distributions\nof the test network and a draw from the estimated models, for the IMDB dataset and the Books\ndataset. It is clearly seen that the four models are able to capture the power-law behavior of the\ndegree distribution of actors (Figure 3(e-h)) or books (Figure 4(e-h)). However, only IG and GGP\nare able to capture the power-law behavior of the degree distribution of movies (Figure 3(a-d)) or\nreaders (Figure 4(a-d)).\n\n8\n\n100102100101102103Degree ModelData100102100101102103Degree ModelData100102100101102103Degree ModelData100102100101102103104Degree ModelData100100101102103104105Degree ModelData100100101102103104105Degree ModelData100100101102103104105Degree ModelData100100101102103104105Degree ModelData100102100101102103Degree ModelData100102100101102103Degree ModelData100102100101102103Degree ModelData100102100101102103Degree ModelData100100101102103104105Degree ModelData100100101102103104105Degree ModelData100100101102103104105Degree ModelData100100101102103104105Degree ModelData\fReferences\n[1] A. Brix. Generalized gamma measures and shot-noise Cox processes. Advances in Applied\n\nProbability, 31(4):929\u2013953, 1999.\n\n[2] F. Caron and Y. W. Teh. Bayesian nonparametric models for ranked data. In Neural Information\n\nProcessing Systems (NIPS), 2012.\n\n[3] D.J. Daley and D. Vere-Jones. An introduction to the theory of point processes. Springer\n\nVerlag, 2008.\n\n[4] L. Devroye. Random variate generation for exponentially and polynomially tilted stable dis-\ntributions. ACM Transactions on Modeling and Computer Simulation (TOMACS), 19(4):18,\n2009.\n\n[5] E.B. Fox, E.B. Sudderth, M.I. Jordan, and A.S. Willsky. Sharing features among dynamical\nIn Advances in Neural Information Processing Systems, vol-\n\nsystems with beta processes.\nume 22, pages 549\u2013557, 2009.\n\n[6] A. Goldenberg, A.X. Zheng, S.E. Fienberg, and E.M. Airoldi. A survey of statistical network\n\nmodels. Foundations and Trends in Machine Learning, 2(2):129\u2013233, 2010.\n\n[7] D. G\u00a8or\u00a8ur, F. J\u00a8akel, and C.E. Rasmussen. A choice model with in\ufb01nitely many latent features. In\nProceedings of the 23rd international conference on Machine learning, pages 361\u2013368. ACM,\n2006.\n\n[8] T Grif\ufb01ths and Z. Ghahramani. In\ufb01nite latent feature models and the Indian buffet process. In\n\nNIPS, 2005.\n\n[9] T. Grif\ufb01ths and Z. Ghahramani. The Indian buffet process: an introduction and review. Journal\n\nof Machine Learning Research, 12(April):1185\u20131224, 2011.\n\n[10] N.L. Hjort. Nonparametric bayes estimators based on beta processes in models for life history\n\ndata. The Annals of Statistics, 18(3):1259\u20131294, 1990.\n\n[11] L.F. James, A. Lijoi, and I. Pr\u00a8unster. Posterior analysis for normalized random measures with\n\nindependent increments. Scandinavian Journal of Statistics, 36(1):76\u201397, 2009.\n\n[12] J.F.C. Kingman. Completely random measures. Paci\ufb01c Journal of Mathematics, 21(1):59\u201378,\n\n1967.\n\n[13] J.F.C. Kingman. Poisson processes, volume 3. Oxford University Press, USA, 1993.\n[14] M.E.J. Newman. The structure and function of complex networks. SIAM review, pages 167\u2013\n\n256, 2003.\n\n[15] M.E.J. Newman, S.H. Strogatz, and D.J. Watts. Random graphs with arbitrary degree distribu-\n\ntions and their applications. Physical Review E, 64(2):26118, 2001.\n\n[16] M.E.J. Newman, D.J. Watts, and S.H. Strogatz. Random graph models of social networks.\n\nProceedings of the National Academy of Sciences, 99:2566, 2002.\n\n[17] J.J. Ramasco, S.N. Dorogovtsev, and R. Pastor-Satorras. Self-organization of collaboration\n\nnetworks. Physical review E, 70(3):036106, 2004.\n\n[18] Y.W. Teh and D. G\u00a8or\u00a8ur. Indian buffet processes with power-law behavior. In NIPS, 2009.\n[19] R. Thibaux and M. Jordan. Hierarchical beta processes and the Indian buffet process.\n\nIn\nInternational Conference on Arti\ufb01cial Intelligence and Statistics, volume 11, pages 564\u2013571,\n2007.\n\n[20] F. Wood, T.L. Grif\ufb01ths, and Z. Ghahramani. A non-parametric Bayesian method for inferring\nIn Proceedings of the Conference on Uncertainty in Arti\ufb01cial Intelligence,\n\nhidden causes.\nvolume 22, 2006.\n\n9\n\n\f", "award": [], "sourceid": 1021, "authors": [{"given_name": "Francois", "family_name": "Caron", "institution": null}]}