{"title": "Inferring Interaction Networks using the IBP applied to microRNA Target Prediction", "book": "Advances in Neural Information Processing Systems", "page_first": 235, "page_last": 243, "abstract": "Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions.", "full_text": "Inferring Interaction Networks using the IBP applied\n\nto microRNA Target Prediction\n\nHai-Son Le\n\nMachine Learning Department\nCarnegie Mellon University\n\nPittsburgh, PA, USA\nhple@cs.cmu.edu\n\nZiv Bar-Joseph\n\nMachine Learning Department\nCarnegie Mellon University\n\nPittsburgh, PA, USA\n\nzivbj@cs.cmu.edu\n\nAbstract\n\nDetermining interactions between entities and the overall organization and clus-\ntering of nodes in networks is a major challenge when analyzing biological and\nsocial network data. Here we extend the Indian Buffet Process (IBP), a nonpara-\nmetric Bayesian model, to integrate noisy interaction scores with properties of\nindividual entities for inferring interaction networks and clustering nodes within\nthese networks. We present an application of this method to study how microR-\nNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the\nmethod improves upon prior methods, correctly recovers interactions and clusters,\nand provides accurate biological predictions.\n\n1\n\nIntroduction\n\nDetermining interactions between entities based on observations is a major challenge when analyz-\ning biological and social network data [1, 12, 15]. In most cases we can obtain information regarding\neach of the entities (individuals in social networks and proteins in biological networks) and some\ninformation about possible relationships between them (friendships or conversation data for social\nnetworks and motif or experimental data for biology). The goal is then to integrate these datasets\nto recover the interaction network between the entities being studied. To simplify the analysis of\nthe data it is also bene\ufb01cial to identify groups, or clusters, within these interaction networks. Such\ngroups can then be mapped to speci\ufb01c demographics or interests in the case of social networks or to\nmodules and pathways in biological networks [2].\nA large number of generative models were developed to represent entities as members of a number of\nclasses. Many of these models are based on the stochastic blockmodel introduced in [19]. While the\nnumber of classes in such models could be \ufb01xed, or provided by the user, nonparametric Bayesian\nmethods have been applied to allow this number to be inferred based on the observed data [9]. The\nstochastic blockmodel was also further extended in [1] to allow mixed membership of entities within\nthese classes. An alternate approach is to use latent features to describe entities. [10] proposed a\nnonparametric Bayesian matrix factorization method to learn the latent factors in relational data\nwhereas [12] presented a nonparametric model to study binary link data. All of these methods rely\non the pairwise link and interaction data and in most cases do not utilize properties of the individual\nentities when determining interactions.\nHere we present a model that extends the Indian Buffet Process (IBP) [7], a nonparametric Bayesian\nprior over in\ufb01nite binary matrices, to learn the interactions between entities with an unbounded\nnumber of groups. Speci\ufb01cally, we represent each group as a latent feature and de\ufb01ne interactions\nbetween entities within each group. Such latent feature representation has been used in the past\nto describe entities [7, 10, 12] and IBP is an appropriate nonparametric prior to infer the number\nof latent features. However, unlike IBP our model utilizes interaction scores as priors and so the\n\n1\n\n\fmodel is not exchangeable anymore. We thus extend IBP by integrating it with Markov random\n\ufb01eld (MRF) constraints, speci\ufb01cally pairwise potentials as in Ising model. MRF priors has been\ncombined with Dirichlet Process mixture models for image segmentation in a related work of Orbanz\nand Buhmann [13]. Pairwise information is also used in the distance dependent Chinese restaurant\nprocess [4] to encourage similar objects to be clustered. Our model is well suited for cases in\nwhich we are provided with information on both link structure and the outcome of the underlying\ninteractions.\nIn social networks such data can come from observations of conversation between\nindividuals followed by actions of the speci\ufb01c individuals (for example, travel), whereas in biology\nit is suited for regulatory networks as discussed below.\nWe apply our model to study the microRNA (miRNA) target prediction problem. miRNAs were\nrecently discovered as a class of regulatory RNA molecules that regulate the levels of messenger\nRNAs (mRNAs) (which are later translated to proteins) by binding and inhibiting their speci\ufb01c\ntargets [15]. They were shown to play an important role in a number of diseases including cancer,\nand determining the set of genes that are targeted by each miRNA is an important question when\nstudying these diseases. Several methods were proposed to predict targets of miRNAs based on\ntheir sequence1. While these predictions are useful, due to the short length of miRNAs, they lead\nto many false positives and some false negatives [8].\nIn addition to sequence information, it is\nnow possible to obtain the expression levels of miRNAs and their predicted mRNA targets using\nmicroarrays. Since miRNAs inhibit their direct targets, integrating sequence and expression data\ncan improve predictions regarding the interactions between miRNAs and their targets. A number of\nmethods based on regression analysis were suggested for this task [8, 17]. While methods utilizing\nexpression data improved upon methods that only used sequence data, they often treated each target\nmRNA in isolation. In contrast, it has now been shown that each miRNA often targets hundreds\nof genes, and that miRNAs often work in groups to achieve a larger impact [14]. Thus, rather than\ntrying to infer a separate regression model for each mRNA we use our IBP extended model to infer\na joint regression model for a cluster of mRNAs and the set of miRNAs that regulate them. Such a\nmodel would provide statistical con\ufb01dence (since it combines several observations) while adhering\nmore closely to the underlying biology. In addition to inferring the interactions in the dataset such a\nmodel would also provide a grouping for genes and miRNAs which can be used to improve function\nprediction.\n\n2 Computational model\n\nFirstly, we derive a distribution on in\ufb01nite binary matrices starting with a \ufb01nite model and taking the\nlimit as the number of features goes to in\ufb01nity. Secondly, we describe the application of our model\nto the miRNA target prediction problem using a Gaussian additive model.\n\n2.1\n\nInteraction model\n\nLet zik denote the (i, k) entry of a matrix Z and let zk denote the kth column of Z. The group\nmembership of N entities is de\ufb01ned by a (latent) binary matrix Z where zik = 1 if entity i belongs\nto group k. Given Z, we say that entity i interacts with entity j if zikzjk = 1 for some k. Note that\ntwo entities can interact through many groups where each group represents one type of interaction.\nIn many cases, a prior on such interactions can be obtained. Assume we have a N \u00d7 N symmetric\nmatrix W, where wij indicates the degree that we believe that entity i and j interact: wij > 0 if\nentities i and j are more likely to interact and wij < 0 if they are less likely to do so.\nNonparametric prior for Z: Grif\ufb01ths and Ghahramani [7] proposed the Indian Buffet Process (IBP)\nas a nonparametric prior distribution on sparse binary matrices Z. The IBP can be derived from a\nsimple stochastic process, described by a culinary metaphor. In this metaphor, there are N customers\n(entities) entering a restaurant and choosing from an in\ufb01nite array of dishes (groups). The \ufb01rst\ncustomer tries Poisson(\u03b1) dishes, where \u03b1 is a parameter. The remaining customers enter one after\nthe others. The ith customer tries a previously sampled dish k with probability mk\ni , where mk is the\nnumber of previous customers who have sampled this dish. He then samples a Poisson( \u03b1\ni ) number of\nnew dishes. This process de\ufb01nes an exchangeable distribution on the equivalence classes of Z, which\nare the set of binary matrices that map to the same left-ordered binary matrices. [7]. Exchangeability\n\n1Genes that are targets of miRNAs contain the reverse complement of part of the miRNA sequence.\n\n2\n\n\fmeans that the order of the customers does not affect the distribution and that permutation of the data\ndoes not change the resulting likelihood.\nThe prior knowledge on interactions discussed above (encoded by W) violates the exchangeability\nof the IBP since the group membership probability depends on the identities of the entities whereas\nexchangeability means that permutation of entities does not change the probability. In [11], Miller\net al. presented the phylogenetic Indian Buffet Process (pIBP), where they used a tree representation\nto express non-exchangeability. In their model, the relationships among customers are encoded as\na tree allowing them to exploit the sum-product algorithm in de\ufb01ning the updates for an MCMC\nsampler, without signi\ufb01cantly increasing the computational burden when performing inference.\nWe combine the IBP with pairwise potentials using W, constraining the dish selection of customers.\nSimilar to the pIBP, the entries in zk are not chosen independently given \u03c0k but rather depend on\nthe particular assignment of the remaining entries. In the following sections, we start with a model\nwith a \ufb01nite number of groups and consider the limit as the number of groups grows to derive the\nnonparametric prior. Note that in our model, as in the original IBP [7], while the number of rows are\n\ufb01nite, the number of columns (features) could be in\ufb01nite. We can thus de\ufb01ne a prior on interactions\nbetween entities (since their number is known in advance) while still allowing for an in\ufb01nite number\nof groups. This \ufb02exibility allows the group parameters to be drawn from an in\ufb01nite mixtures of\npriors which may lead to identical groups of entities each with a different set of parameters.\n\nP (zk|\u03c0k) = exp\n\n\u03c0k|\u03b1 \u223c Beta(cid:0) \u03b1\n\n, 1(cid:1)\n\nK\n\n2.1.1 Prior on \ufb01nite matrices Z\nWe have an N \u00d7 K binary matrix Z where N is the number of entities and K is a \ufb01xed, \ufb01nite\nnumber of groups. In the IBP, each group/column k is associated with a parameter \u03c0k, chosen from\na Beta(\u03b1/K, 1) prior distribution where \u03b1 is a hyperparameter:\n\n(cid:16)(cid:88)\n\n(cid:1)(cid:17)\n(cid:0)(1 \u2212 zik) log(1 \u2212 \u03c0k) + zik log \u03c0k\n(cid:17)\n\u2212 1(cid:1) log \u03c0k\n\n(cid:1) +(cid:0) \u03b1\n\n(cid:16)(cid:88)\n\ni\n\n1\nB( \u03b1\nK , 1)\n\nThe joint probability of a column k and \u03c0k in the IBP is:\nP (zk, \u03c0k|\u03b1) =\nexp\nwhere B(\u00b7) is the Beta function.\nFor our model, we add the new pairwise potentials on memberships of entities. De\ufb01ning \u03a6zk =\n\n(cid:0)(1 \u2212 zik) log(1 \u2212 \u03c0k) + zik log \u03c0k\n(cid:1), the joint probability of a column k and \u03c0k is:\n(cid:0)(1 \u2212 zik) log(1 \u2212 \u03c0k) + zik log \u03c0k\n\n\u2212 1(cid:1) log \u03c0k\n\nexp(cid:0)(cid:80)\n\n(cid:1) +(cid:0) \u03b1\n\ni 0.\nBy integrating over all values of \u03c0k, we get the marginal probability of a binary matrix Z.\n\ni\n\n(cid:90) 1\n\nk=1\n\nK(cid:89)\nK(cid:89)\nK(cid:89)\n\nk=1\n\nk=1\n\nP (Z) =\n\n=\n\n=\n\nP (zk, \u03c0k|\u03b1) d\u03c0k\n\n0\n\n1\nZ(cid:48) \u03a6zk\n\n(cid:90) 1\nZ(cid:48) \u03a6zk B(cid:0) \u03b1\n\n1\n\n0\n\nK\n\nexp\n\n(cid:16)(cid:0) \u03b1\n+ mk, N \u2212 mk + 1(cid:1)\n\nK\n\n3\n\n(cid:17)\n+ mk \u2212 1(cid:1) log \u03c0k + (N \u2212 mk) log(1 \u2212 \u03c0k)\n\n(3)\n\nd\u03c0k\n\n(4)\n\n(5)\n\n\fThe partition function Z(cid:48) could be written as: Z(cid:48) =(cid:80)2N\u22121\n\nh=0 \u03a6hB(cid:0) \u03b1\n\nK + mh, N \u2212 mh + 1(cid:1).\n\n2.1.2 Taking the in\ufb01nite limit\n\n(cid:88)\n\nThe probability of a particular lof-equivalence class of binary matrices, [Z], is:\n\nZ(cid:48) \u03a6zk B(cid:0)mk +\nTaking the limit when K \u2192 \u221e, we can show that with \u03a8 =(cid:80)2N\u22121\n\nh=0 Kh!\n\nP ([Z]) =\n\nP (Z) =\n\n\u03b1\nK\n\nk=1\n\nK(cid:89)\n\n1\n\nZ\n\n, N \u2212 mk + 1(cid:1)\n\n(N\u2212mh)!(mh\u22121)!\n\nK!(cid:81)2N\u22121\nK+(cid:89)\n\nh=0 Kh!\n\n\u03b1K+(cid:81)2N\u22121\n\n=\n\nh=1 Kh!\n\nk=1\n\nh=1 \u03a6h\nK , N \u2212 mk + 1)\nK , N + 1)\n\nK(cid:89)\nexp(cid:0) \u2212 \u03b1\u03a8)\n\nk=1\n\nN !\n\n1\nZ(cid:48) B(\n\n\u03b1\nK\n\n\u03a6zk\n\nB(mk + \u03b1\nB( \u03b1\n\n(N \u2212 mk)!(mk \u2212 1)!\n\nN !\n\n\u03a6zk\n\nK!(cid:81)2N\u22121\nK+(cid:89)\n\nk=1\n\nK\u2192\u221e P ([Z]) = lim\nlim\nK\u2192\u221e\n\n(6)\n\n:\n\n, N + 1) (7)\n\n(8)\n\nThe detailed derivations are shown in Appendix.\n\n2.1.3 The generative process\n\nWe now describe a generative stochastic process for Z. It can be understood by a culinary metaphor,\nwhere each row of Z corresponds to a customer and each column corresponds to a dish. We denote\nby h(i) the value of zik in the complete history h. With \u00af\u03a6h = \u03a6h\n, we de\ufb01ne\n\n(N\u2212mh)!(mh\u22121)!\n\ni=1 \u03a8i. Finally, let z 0 for groups and r = [r1, . . . , rR]T\nfor all miRNAs) to model the downregulation effect. These coef\ufb01cients represent the baseline effect\nof group members and the strength of speci\ufb01c miRNAs, respectively. Using these parameters the\nexpression level of a speci\ufb01c mRNA could be explained by summing over expression pro\ufb01les of all\nmiRNAs targeting the mRNA:\n\n(cid:88)\n\n(rj +\n\nk:uikvjk=1\n\nxi \u223c N(cid:0)\u00b5 \u2212(cid:88)\n(cid:16) \u2212 1\n\nj\n\nk:uikvjk=1 sk) yj.\n\nsk) yj, \u03c32I(cid:1)\n(cid:17)\n\n(10)\n\n(11)\n\nr \u223c\n\nwhere \u00b5 represents baseline expression for this mRNA and \u03c3 is used to represent measurement noise.\nThus, under this model, the expression of a mRNA are reduced from their baseline values by a linear\ncombination of expression values of the miRNAs that target them. The probability of the observed\ndata given Z is: P (X, Y|Z, \u0398) \u221d exp\n, with \u0398 = {\u00b5, \u03c32, s, r}\n\n(cid:80)\ni(xi \u2212 \u00afxi)T (xi \u2212 \u00afxi)\n\n2\u03c32\n\nand \u00afxi = \u00b5 \u2212(cid:80)\n\nj(rj +(cid:80)\n\n2.2.2 Priors for model variables\n\nWe use the following as prior distributions for the variables in our model:\nr \u223c N (0, \u03c32\n\nr I)\n\nsk \u223c Gamma(\u03b1s, \u03b2s)\n\u00b5 \u223c N (0, \u03c32\n\n\u00b5I)\n\n1/\u03c32 \u223c Gamma(\u03b1v, \u03b2v)\n\nwhere the \u03b1 and \u03b2 are the shape and scale parameters. The parameters are given hyperpriors: 1/\u03c32\n\u00b5 \u223c Gamma(a\u00b5, b\u00b5). \u03b1s, \u03b2s, \u03b1v, \u03b2v are also given Gamma hyperpriors.\nGamma(ar, br) and 1/\u03c32\n\n3\n\nInference by MCMC\n\nAs with many nonparametric Bayesian models, exact inference is intractable.\nInstead we use a\nMarkov Chain Monte Carlo (MCMC) method to sample from the posterior distribution of Z and \u0398.\nAlthough, our model allows Z to have in\ufb01nite number of columns, we only need to keep track of\nnon-zero columns of Z, an important aspect which is exploited by several nonparametric Bayesian\nmodels [7]. Our sampling algorithm involves a mix of Gibbs and Metropolis-Hasting steps which\nare used to generate the new sample.\n\n3.1 Sampling from populated columns of Z\n\nLet m\u2212ik is the number of one entries not including zik in zk. Also let z\u2212ik denote the entries of\nzk except zik and let Z\u2212(ik) be the entire matrix Z except zik. The probability of an entry given\nthe remaining entries in a column can be derived by considering an ordering of customers such that\ncustomer i is the last person in line and using the generative process in Section 2.1.3:\n\nP (zik = 1|z\u2212ik) =\n\n=\n\n=\n\n\u00af\u03a6z