{"title": "Efficient Bayesian Inference for Dynamically Changing Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 1441, "page_last": 1448, "abstract": null, "full_text": "Adaptive Bayesian Inference\n\nUmut A. Acar\u2217\nToyota Tech. Inst.\n\nChicago, IL\n\numut@tti-c.org\n\nAlexander T. Ihler\n\nU.C. Irvine\nIrvine, CA\n\nRamgopal R. Mettu\u2020\nUniv. of Massachusetts\n\nAmherst, MA\n\n\u00a8Ozg\u00a8ur S\u00a8umer\nUni. of Chicago\n\nChicago, IL\n\nosumer@cs.uchicago.edu\n\nihler@ics.uci.edu\n\nmettu@ecs.umass.edu\n\nAbstract\n\nMotivated by stochastic systems in which observed evidence and conditional de-\npendencies between states of the network change over time, and certain quantities\nof interest (marginal distributions, likelihood estimates etc.) must be updated, we\nstudy the problem of adaptive inference in tree-structured Bayesian networks. We\ndescribe an algorithm for adaptive inference that handles a broad range of changes\nto the network and is able to maintain marginal distributions, MAP estimates, and\ndata likelihoods in all expected logarithmic time. We give an implementation of\nour algorithm and provide experiments that show that the algorithm can yield up\nto two orders of magnitude speedups on answering queries and responding to dy-\nnamic changes over the sum-product algorithm.\n\n1 Introduction\n\nGraphical models [14, 8] are a powerful tool for probabilistic reasoning over sets of random vari-\nables. Problems of inference, including marginalization and MAP estimation, form the basis of\nstatistical approaches to machine learning. In many applications, we need to perform inference un-\nder dynamically changing conditions, such as the acquisition of new evidence or an alteration of\nthe conditional relationships which make up the model. Such changes arise naturally in the experi-\nmental setting, where the model quantities are empirically estimated and may change as more data\nare collected, or in which the goal is to assess the effects of a large number of possible interven-\ntions. Motivated by such applications, Delcher et al. [6] identify dynamic Bayesian inference as the\nproblem of performing Bayesian inference on a dynamically changing graphical model. Dynamic\nchanges to the graphical model may include changes to the observed evidence, to the structure of the\ngraph itself (such as edge or node insertions/deletions), and changes to the conditional relationships\namong variables.\nTo see why adapting to dynamic changes is dif\ufb01cult, consider the simple problem of Bayesian infer-\nence in a Markov chain with n variables. Suppose that all marginal distributions have been computed\nin O(n) time using the sum-product algorithm, and that some conditional distribution at a node u\nis subsequently updated. One way to update the marginals would be to recompute the messages\ncomputed by sum-product from u to other nodes in the network. This can take \u2126(n) time because\nregardless of where u is in the network, there always is another node v at distance \u2126(n) from u.\nA similar argument holds for general tree-structured networks. Thus, simply updating sum-product\nmessages can be costly in applications where marginals must be adaptively updated after changes to\nthe model (see Sec. 5 for further discussion).\nIn this paper, we present a technique for ef\ufb01cient adaptive inference on graphical models. For a tree-\nstructured graphical model with n nodes, our approach supports the computation of any marginal,\nupdates to conditional probability distributions (including observed evidence) and edge insertions\n\n\u2217U. A. Acar is supported by a gift from Intel.\n\u2020R. R. Mettu is supported by a National Science Foundation CAREER Award (IIS-0643768).\n\n1\n\n\fand deletions in expected O(log n) time. As an example of where adaptive inference can be effec-\ntive, consider a computational biology application that requires computing the state of the active site\nin a protein as the user modi\ufb01es the protein (e.g., mutagenesis). In this application, we can represent\nthe protein with a graphical model and use marginal computations to determine the state of the ac-\ntive site. We re\ufb02ect the modi\ufb01cations to the protein by updating the graphical model representation\nand performing marginal queries to obtain the state of the active site. We show in Sec. 5 that our\napproach can achieve a speedup of one to two orders of magnitude over the sum-product algorithm\nin such applications.\nOur approach achieves logarithmic update and query times by mapping an arbitrary tree-structued\ngraphical model into a balanced representation that we call a cluster tree (Sec. 3\u20134). We perform\nan O(n)-time preprocessing step to compute the cluster tree using a technique known as tree con-\ntraction [13]. We ensure that for an input network with n nodes, the cluster tree has an expected\ndepth of O(log n) and expected size O(n). We show that the nodes in the cluster tree can be tagged\nwith partial computations (corresponding to marginalizations of subtrees of the input network) in\nway that allows marginal computations and changes to the network to be performed in O(log n) ex-\npected time. We give simulation results (Sec. 5) that show that our algorithm can achieve a speedup\nof one to two orders of magnitude over the sum-product algorithm. Although we focus primarily\non the problem of answering marginal queries, it is straightforward to generalize our algorithms to\nother, similar inference goals, such as MAP estimation and evaluating the likelihood of evidence.\nWe note that although tree-structured graphs provide a relatively restrictive class of models, junction\ntrees [14] can be used to extend some of our results to more general graphs. In particular, we can\nstill support changes to the parameters of the distribution (evidence and conditional relationships),\nalthough changes to the underlying graph structure become more dif\ufb01cult. Additionally, a number\nof more sophisticated graphical models require ef\ufb01cient inference over trees at their core, includ-\ning learning mixtures of trees [12] and tree-reparameterized max-product [15]. Both these methods\ninvolve repeatedly performing a message passing algorithm over a set of trees with changing param-\neters or evidence, making ef\ufb01cient updates and recomputations a signi\ufb01cant issue.\n\nRelated Work.\nIt is important to contrast our notion of adapting to dynamic updates to the graph-\nical model (due to changes in the evidence, or alterations of the structure and distribution) with\nthe potentially more general de\ufb01nition of dynamic Bayes\u2019 nets (DBNs) [14]. Speci\ufb01cally, a DBN\ntypically refers to a Bayes\u2019 net in which the variables have an explicit notion of time, and past\nobservations have some in\ufb02uence on estimates about the present and future. Marginalizing over un-\nobserved variables at time t\u22121 typically produces increased complexity in the the model of variables\nat time t. However, in both [6] and this work, the emphasis is on performing inference with current\ninformation only, and ef\ufb01ciency is obtained by leveraging the similarity between the previous and\nnewly updated models.\nOur work builds on previous work by Delcher, Grove, Kasif and Pearl [6]; they give an algorithm to\nupdate Bayesian networks dynamically as the observed variables in the network change and com-\npute belief queries of hidden variables in logarithmic time. The key difference between their work\nand ours is that their algorithm only supports updates to observed evidence, and does not support dy-\nnamic changes to the graph structure (i.e., insertion/deletion of edges) or to conditional probabilities.\nIn many applications it is important to consider the effect of changes to conditional relationships be-\ntween variables; for example, to study protein structure (see Sec. 5 for further discussion). In fact,\nDelcher et al. cite structural updates to the given network as an open problem. Another difference\nincludes the use of tree contraction: they use tree contractions to answer queries and perform up-\ndates. We use tree contractions to construct a cluster tree, which we then use to perform queries and\nall other updates (except for insertions/deletions). We provide an implementation and show that this\napproach yields signi\ufb01cant speedups.\nOur approach to clustering factor graphs using tree contractions is based on previous results that\nshow that tree contractions can be updated in expected logarithmic time under certain dynamic\nchanges by using a general-purpose change-propagation algorithm [2]. The approach has also been\napplied to a number of basic problems on trees [3] but has not been considered in the context of\nstatistical inference. The change-propagation approach used in this work has also been extended\nto provide a general-purpose technique for updating computations under changes to their data and\napplied to a number of applications (e.g. [1]).\n\n2\n\n\f2 Background\n\nGraphical models provide a convenient formalism for describing the structure of a function g de-\n\ufb01ned over a set of variables x1, . . . , xn (most commonly a joint probability distribution over the\nxi). Graphical models use this structure to organize computations and create ef\ufb01cient algorithms\nfor many inference tasks over g, such as \ufb01nding a maximum a-posteriori (MAP) con\ufb01guration,\nmarginalization, or computing data likelihood. For the purposes of this paper, we assume that\neach variable xi takes on values from some \ufb01nite set, denoted Ai. We write the operation of\n, and let Xj represent an index-ordered subset of variables and f(Xj)\na function de\ufb01ned over those variables, so that for example if Xj = {x2, x3, x5}, then the function\nf(Xj) = f(x2, x3, x5). We use X to indicate the index-ordered set of all {x1, . . . , xn}.\n\nmarginalizing over xi asP\n\nxi\n\nIn particular, suppose that g decomposes into a product of simpler functions, g(X) =Q\n\nFactor Graphs. A factor graph [10] is one type of graphical model, similar to a Bayes\u2019 net [14]\nor Markov random \ufb01eld [5] used to represent the factorization structure of a function g(x1, . . . , xn).\nj fj(Xj),\nfor some collection of real-valued functions fj, called factors, whose arguments are (index-ordered)\nsets Xj \u2286 X. A factor graph consists of a graph-theoretic abstraction of g\u2019s factorization, with\nvertices of the graph representing variables xi and factors fj. Because of the close correspondence\nbetween these quantities, we abuse notation slightly and use xi to indicate both the variable and its\nassociated vertex, and fj to indicate both the factor and its vertex.\nDe\ufb01nition 2.1. A factor graph is a bipartite graph G = (X + F, E) where X = {x1, x2, . . . , xn}\nis a set of variables, F = {f1, f2, . . . , fm} is a set of factors and E \u2286 X \u00d7 F . A factor tree is a\nfactor graph G where G is a tree. The neighbor set N (v) of a vertex v is the (index-ordered) set of\nj fj(Xj) if, for each\nfactor fj, the arguments of fj are its neighbors in G, i.e., N (fj) = Xj.\nOther types of graphical models, such as Bayes\u2019 nets [14], can be easily converted into a factor\ngraph representation. When the Bayes\u2019 net is a polytree (singly connected directed acyclic graph),\nthe resulting factor graph is a factor tree.\n\nvertices adjacent to vertex v. The graph G represents the function g(X) =Q\n\nmarginals of g(X), de\ufb01ned for each i by gi(xi) = P\n\nThe Sum-Product Algorithm. The factorization of g(X) and its structure as represented by the\ngraph G can be used to organize various computations about g(X) ef\ufb01ciently. For example, the\nX\\{xi} g(X) can be computed using the\n\nsum\u2013product algorithm.\nSum-product is best described in terms of messages sent between each pair of adjacent vertices in\nthe factor graph. For every pair of neighboring vertices (xi, fj) \u2208 E, the vertex xi sends a message\n\u00b5xi\u2192fj as soon as it receives the messages from all of its neighbors except for fj, and similarly\nfor the message from fj to xi. The messages between these vertices take the form of a real-valued\nfunction of the variable xi; for discrete-valued xi this can be represented as a vector of length |Ai|.\nThe message \u00b5xi\u2192fj sent from a variable vertex xi to a neighboring factor vertex fj, and the mes-\nsage \u00b5fj\u2192xi from factor fj to variable xi are given by\n\n\u00b5fj\u2192xi(xi) = X\n\nfj(Xj) Y\n\n\u00b5x\u2192fj (x)\n\nXj\\xi\n\nx\u2208Xj\\xi\n\n\u00b5xi\u2192fj (xi) = Y\nmultiplying all the incoming messages, i.e., gi(xi) =Q\n\n\u00b5f\u2192xi(xi)\n\nf\u2208N (xi)\\fj\n\nOnce all the messages (2|E| in total) are sent, we can calculate the marginal gi(xi) by simply\nf\u2208N (xi) \u00b5f\u2192xi(xi). Sum\u2013product can be\nthought of as selecting an ef\ufb01cient elimination ordering of variables (leaf to root) and marginalizing\nin that order.\n\nOther Inferences. Although in this paper we focus on marginal computations using sum\u2013product,\nsimilar message passing operations can be generalized to other tasks. For example, the operations\nof sum\u2013product can be used to compute the data likelihood of any observed evidence; such com-\nputations are an inherent part of learning and model comparisons (e.g., [12]). More generally, sim-\nilar algorithms can be de\ufb01ned to compute functions over any semi\u2013ring possessing the distributive\nproperty [11]. Most commonly, the max operation produces a dynamic programming algorithm\n(\u201cmax\u2013product\u201d) to compute joint MAP con\ufb01gurations [15].\n\n3\n\n\f(Round 1)\n\n(Round 2)\n\n(Round 3)\n\n(Round 4)\n\nFigure 1: Clustering a factor graph with rake, compress, finalize operations.\n\n3 Constructing the Cluster Tree\n\nIn this section, we describe an algorithm for constructing a balanced representation of the input\ngraphical model, that we call a cluster tree. Given the input graphical model, we \ufb01rst apply a clus-\ntering algorithm that hierarchically clusters the graphical model, and then apply a labeling algorithm\nthat labels the clusters with cluster functions that can be used to compute marginal queries.\n\nClustering Algorithm. Given a factor graph as input, we \ufb01rst tag each node v with a unary cluster\nthat consists of v and each edge (u, v) with a binary cluster that consists of the edge (u, v). We then\ncluster the tree hierarchically by applying the rake, compress, and \ufb01nalize operations. When applied\nto a leaf node v with neighbor u, the rake operation deletes the v and the edge (u, v), and forms\nunary cluster by combining the clusters which tag either v or (u, v); u is tagged with the resulting\ncluster. When applied to a degree-two node v with neighbors u and w, a compress operation deletes\nv and the edges (u, v) and (v, w), inserts the edge (u, w), and forms a binary cluster by combining\nthe clusters which tag the deleted node and edges; (u, w) is then tagged with the resulting cluster.\nA \ufb01nalize operation is applied when the tree consists of a single node (when no edges remain); it\nconstructs a \ufb01nal cluster that consists of all the clusters with which the \ufb01nal node is tagged.\nWe cluster a tree T by applying rake\nand compress operations in rounds. Each\nround consists of the following two steps\nuntil no more edges remain: (1) Apply the\nrake operation to each leaf; (2) Apply the\ncompress operation to an independent set\nof degree-two nodes. We choose a ran-\ndom independent set: we \ufb02ip a coin for\neach node in each round and apply com-\npress to a degree-two node only if it \ufb02ips\nheads and its two neighbors \ufb02ips tails. This\nensures that no two adjacent nodes apply\ncompress simultaneously. When all edges\nare deleted, we complete the clustering by\napplying the \ufb01nalize operation.\nFig. 1 shows a four-round clustering of a\nfactor graph and Fig. 2 shows the corre-\nsponding cluster tree. In round 1, nodes f1, f2, f5 are raked and f4 is compressed. In round 2,\nx1, x2, x4 are raked. In round 3, f3 is raked. A \ufb01nalize operation is applied in round 4 to produce\nthe \ufb01nal cluster. The leaves of the cluster tree correspond to the nodes and the edges of the factor\ngraph. Each internal node \u00afv corresponds a unary or a binary cluster formed by deleting v. The\nchildren of an internal node are the edges and the nodes deleted during the operation that forms the\ncluster. For example, the cluster \u00aff1 is formed by the rake operation applied to f1 in round 1. The\nchildren of \u00aff1 are node f1 and edge (f1, x1), which are deleted during that operation.\n\nFigure 2: A cluster tree.\n\n4\n\n\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u00aff2x3f4f3x4x1x2f5f2f1\u00aff1\u00aff4\u00aff5\u0001\u0001\u0001\u0001\u00aff4x3x4f3x1x2\u00afx4\u00afx1\u00aff1\u00aff2\u00afx2\u00aff5\u0001\u00afx2f3x3\u00afx1\u00aff3\u00afx4x3\u00aff3\u00afx4\u00aff5\u00afx1f2x2f2x3f3\u00afx2x3f3x4x4f4f4x3f4x2f3x2\u00aff2x4f5f5x1f3x1\u00aff1x1f1f1\u00aff3\u00afx3\u00afx4\u00aff4=x3x4\fV(\u00afv) = S\n\nLabeling Algorithm. After building the cluster tree, we compute cluster functions along with a\nnotion of orientation for neighboring clusters in a second pass, which we call labeling.1 The cluster\nfunction at a node \u00afv in the tree is computed recursively using the cluster functions at \u00afv\u2019s child\nclusters, which we denote S\u00afv = {\u00afv1, . . . , \u00afvk}. Intuitively, each cluster function corresponds to a\npartial marginalization of the factors contained in cluster \u00afv.\nSince each cluster function is de\ufb01ned over a subset of the variables in the original graph, we re-\nquire some additional notation to represent these sets. Speci\ufb01cally, for a cluster \u00afv, let A(\u00afv) be\nthe arguments of its cluster function, and let V(\u00afv) be the set of all arguments of its children,\ni A(\u00afvi). In a slight abuse of notation, we let A(v) be the arguments of the node v in\nthe original graph, so that if v is a variable node A(v) = v and if v is a factor node A(v) = N (v).\nThe cluster functions c\u00afv(\u00b7) and their arguments are then de\ufb01ned recursively, as follows. For the base\ncase, the leaf nodes of the cluster tree correspond to nodes v in the original graph, and we de\ufb01ne cv\nusing the original variables and factors. If v is a factor node fj, we take cv(A(v)) = fj(Xj), and if\nv is a variable node xi, A(v) = xi and cv = 1. For nodes of the cluster tree corresponding to edges\n(u, v) of the original graph, we simply take A(u, v) = \u2205 and cu,v = 1.\nThe cluster function at an internal node of the cluster tree is then given by combining the cluster\nfunctions of its children and marginalizing over as many variables as possible. Let \u00afv be the internal\nnode corresponding to the removal of v in the original graph. If \u00afv is a binary cluster (u, w), that is,\nat v\u2019s removal it had two neighbors u and w, then c\u00afv is given by\n\nc\u00afv(A(\u00afv)) = X\n\nY\n\nc\u00afvi(A(\u00afvi))\n\nV(\u00afv)\\A(\u00afv)\n\n\u00afvi\u2208S\u00afv\n\nwhere the arguments A(\u00afv) = V(\u00afv) \u2229 (A(u) \u222a A(w)). For unary cluster \u00afv, where v had a single\nneighbor u at its removal, c\u00afv(\u00b7) is calculated in the same way with A(w) = \u2205.\nWe also compute an orientation for each cluster\u2019s neighbors based on their proximity to the cluster\ntree\u2019s root. This is also calculated recursively using the orientations of each node\u2019s ancestors. For\na unary cluster \u00afv with parent \u00afu in the cluster tree, we de\ufb01ne in(\u00afv) = \u00afu. For a binary cluster \u00afv with\nneighbors u, w at v\u2019s removal, de\ufb01ne in(\u00afv) = \u00afw and out(\u00afv) = \u00afu if \u00afw = in(\u00afu); otherwise in(\u00afv) = \u00afu\nand out(\u00afv) = \u00afw.\nWe now describe the ef\ufb01ciency of our clustering and labeling algorithms and show that the resulting\ncluster tree is linear in the size of the input factor graph.\nTheorem 1 (Hierarchical Clustering). A factor tree of n nodes with maximum degree of k can be\nclustered and labeled in expected O(dk+2n) time where d is the domain size of each variable in the\nfactor tree. The resulting cluster tree has exactly 2n \u2212 1 leaves and n internal clusters (nodes) and\nexpected O(log n) depth where the expectation is taken over internal randomization (over the coin\n\ufb02ips). Furthermore, the cluster tree has the following properties: (1) each cluster has at most k + 1\nchildren, and (2) if \u00afv = (u, w) is a binary cluster, then \u00afu and \u00afw are ancestors of \u00afv, and one of them\nis the parent of \u00afv.\n\nProof. Consider \ufb01rst the construction of the cluster tree. The time and the depth bound follow from\nprevious work [2]. The bound on the number of nodes holds because the leaves of the cluster tree\ncorrespond to the n \u2212 1 edges and n nodes of the factor graph. To see that each cluster has at\nmost k + 1 children, note that the a rake or compress operation deletes one node and the at most k\nedges incident on that node. Every edge appearing in any level of the tree contraction algorithm is\nrepresented as a binary cluster \u00afv = (u, w) in the cluster tree. Whenever an edge is removed, one of\nthe nodes incident to that edge, say u is also removed, making \u00afu the parent of \u00afv. The fact that \u00afw is\nalso an ancestor of \u00afv follows from an induction argument on the levels.\nConsider the labeling step. By inspection of the labeling algorithm, the computation of the ar-\nguments A(\u00b7) and V(\u00b7) requires O(k) time. To bound the time for computing a cluster function,\nobserve that A(\u00afv) is always a singleton set if \u00afv is a unary cluster, and A(\u00afv) has at most two variables\nif \u00afv is a binary cluster. Therefore, |V(\u00afv)| \u2264 k + 2. The number of operations required to compute\n1Although presented here as a separate labeling operation, the cluster functions can alternatively be com-\nputed at the time of the rake or compress operation, as they depend only on the children of \u00afv, and the orientations\ncan be computed during the query operation, since they depend only on the ancestors of \u00afv.\n\n5\n\n\fcluster can appear only once as a child,P|S\u00afv| is O(n) and thus the labeling step takes O(dk+2n)\n\nthe cluster function at \u00afv is bounded by O(|S\u00afv| d|V(\u00afv)|), where S\u00afv are the children of \u00afv. Since each\n\ntime. Although the running time may appear large, note that the representation of the factor graph\ntakes O(dkn) space if functions associated with factors are given explicitly.\n\n4 Queries and Dynamic Changes\n\nWe give algorithms for computing marginal queries on the cluster trees and restructuring the cluster\ntree with respect to changes in the underlying graphical model. For all of these operations, our\nalgorithms require expected logarithmic time in the size of the graphical model.\n\nQueries. We answer marginal queries at a vertex v of the graphical node by using the cluster tree.\nAt a high level, the idea is to \ufb01nd the leaf of the cluster tree corresponding to v and compute the\nmessages along the path from the root of the cluster tree to v. Using the orientations computed\nduring the tagging pass, for each cluster \u00afv we de\ufb01ne the following messages:\n\n(cid:17)\n(cid:17)\n\n,\n\n\u00afui\u2208S\u00afu\\{\u00afv} c\u00afui(A(\u00afui))\n\u00afui\u2208S\u00afu\\{\u00afv} c\u00afui(A(\u00afui))\n\nif \u00afu = in(\u00afv)\n\n,\n\nif \u00afu = out(\u00afv),\n\nm\u00afu\u2192\u00afv =\n\nQ\nQ\n\nmin(\u00afu)\u2192\u00afu\n\nmout(\u00afu)\u2192\u00afu\n\n(cid:16)\n(cid:16)\n\nPV(\u00afu)\\A(\u00afv)\nPV(\u00afu)\\A(\u00afv)\n\n\uf8f1\uf8f2\uf8f3\ngi(xi) = X\n\nwhere S\u00afu is the set of the children of \u00afu. Note that for unary clusters, out(\u00b7) is unde\ufb01ned; we de\ufb01ne\nthe message in this case to be 1.\nTheorem 2 (Query). Given a factor tree with n nodes, maximum degree k, domain size d, and its\ncluster tree, the marginal at any xi can be computed with the following formula\nc\u00afvi(A(\u00afvi)),\n\nmout(xi)\u2192xi min(xi)\u2192xi\n\nY\n\nV(\u00afxi)\\{xi}\n\n\u00afvi\u2208S\u00afxi\n\nwhere S\u00afxi is the set of children of \u00afxi, in O(kdk+2 log n) time.\nMessages are computed only at the ancestors of the query node xi and downward along the path to\nxi; there are at most O(log n) nodes in this path by Theorem 1. Computing each message requires\nat most O(kdk+2) time, and any marginal query takes O(kdk+2 log n) time.\n\nDynamic Updates. Given a factor graph and its cluster tree, we can change the function of a factor\nand update the cluster tree by starting at the leaf of the cluster tree that corresponds to the factor and\nrelabeling all the clusters on the path to the root. Updating these labels suf\ufb01ces, because the label of\na cluster is a function of its children only. Since relabeling a cluster takes O(kdk+2) time and the\ncluster tree has expected O(log n) depth, any update requires O(kdk+2 log n) time.\nTo allow changes to the factor graph itself by insertion/deletion of edges, we maintain a forest of\nfactor trees and the corresponding cluster trees (obtained by clustering the trees one by one). We\nalso maintain the sequence of operations used to construct each cluster tree, i.e., a data structure\nwhich represents the state of the clustering at each round. Note that this structure is also size O(n),\nsince at each round a constant fraction of nodes are removed (raked or compressed) in expectation.\nSuppose now that the user inserts an edge that connects two trees, or deletes an edge connecting two\nsubtrees. It turns out that both operations have only a limited effect on the sequence of clustering\noperations performed during construction, affecting only a constant number of nodes at each round\nof the process. Using a general-purpose change propagation technique (detailed in previous work [2,\n1]) the necessary alterations can be made to the cluster tree in expected O(log n) time. Change\npropagation gives us a new cluster tree that corresponds to the cluster tree that we would have\nobtained by re-clustering from scratch, conditioned on the same internal randomization process.\nIn addition to changing the structure of the cluster tree via change propagation, we must also change\nthe labeling information (cluster functions and orientation) of the affected nodes, which can be done\nusing the same process described in Sec. 3. It is a property of the tree contraction process that all such\naffected clusters form a subtree of the cluster tree that includes the root. Since change propagation\naffects an expected O(log n) clusters, and since each cluster can be labeled in O(kdk+2) time, the\nnew labels can be computed in O(kdk+2 log n) time.\nFor dynamic updates, we thus have the following theorem.\n\n6\n\n\fTheorem 3 (Dynamic Updates). For a factor forest F of n vertices with maximum degree k, and\ndomain size d, the forest of cluster trees can be updated in expected O(kdk+2 log n) time under edge\ninsertions/deletions, and changes to factors.\n\n5 Implementation and Experimental Results\n\nWe have implemented our algorithm in Matlab2 and compared its performance against the standard\ntwo-pass sum-product algorithm (used to recompute marginals after dynamic changes). Fig. 3 shows\nthe results of a simulation experiment in which we considered randomly generated factor trees be-\ntween 100 and 1000 nodes, with each variable having 51, 52, or 53 states, so that each factor has\nsize between 52 and 56. These factor tree correspond roughly to the junction trees of models with\nbetween 200 and 6000 nodes, where each node has up to 5 states. Our results show that the time\nrequired to build the cluster tree is comparable to one run of sum-product. Furthermore, the query\nand update operations in the cluster tree incur relatively small constant factors in their asymptotic\nrunning time, and are between one to two orders of magnitude faster than recomputing from scratch.\nA particularly compelling application area, and one of the original motivations for developing our al-\ngorithm, is in the analysis of protein structure. Graphical models constructed from protein structures\nhave recently been used to successfully predict structural properties [17] as well as free energy [9].\nThese models are typically constructed by taking each node as an amino acid whose states represent\ntheir most common conformations, known as rotamers [7], and basing conditional probabilities on\nproximity, and a physical energy function (e.g., [16]) and/or empirical data [4].\nOur algorithm is a natural choice for problems where various aspects of protein structure are allowed\nto change. One such application is computational mutagenesis, in which functional amino acids in\na protein structure are identi\ufb01ed by examining systematic amino acid mutations in the protein struc-\nture (i.e., to characterize when a protein \u201closes\u201d function). In this setting, performing updates to\nthe model (i.e., mutations) and queries (i.e., the free energy or maximum likelihood set of rotameric\nstates) to determine the effect of updates would be likely be far more ef\ufb01cient than standard methods.\nThus, our algorithm has the potential to substantially speed up computational studies that examine\neach of a large number local changes to protein structure, such as in the study of protein \ufb02exibility\nand dynamics. Interestingly, [6] actually give a sample application in computational biology, al-\nthough their model is a simple sequence-based HMM in which they consider the effect of changing\nobserved sequence on secondary structure only.\nThe simulation results given in Fig. 3 validate the use of our algorithm for these applications, since\nprotein-structure based graphical models have similar complexity to the inputs we consider: proteins\nrange in size from hundreds to thousands of amino acids, and each amino acid typically has relatively\nfew rotameric states and local interactions. As in prior work [17], our simulation considers a small\nnumber of rotamers per amino acid, but the one to two order-of-magnitude speedups obtained by\nour algorithm indicate that it maybe be possible also to handle higher-resolution models (e.g., with\nmore rotamer states, or degrees of freedom in the protein backbone).\n\n6 Conclusion\n\nWe give an algorithm for adaptive inference in dynamically changing tree-structured Bayesian net-\nworks. Given n nodes in the network, our algorithm can support updates to the observed evidence,\nconditional probability distributions, as well as updates to the network structure (as long as they\nkeep the network tree-structured) with O(n) preprocessing time and O(log n) time for queries on\nany marginal distribution. Our algorithm can easily be modi\ufb01ed to maintain MAP estimates as well\nas compute data likelihoods dynamically, with the same time bounds. We implement the algorithm\nand show that it can speed up Bayesian inference by orders of magnitude after small updates to\nthe network. Applying our algorithm on the junction tree representation of a graph yields an in-\nference algorithm that can handle updates on conditional distributions and observed evidence in\ngeneral Bayesian networks (e.g., with cycles); an interesting open question is whether updates to the\nnetwork structure (i.e., edge insertions/deletions) can also be supported.\n\n2Available for download at http://www.ics.uci.edu/\u223cihler/code/.\n\n7\n\n\fFigure 3: Log-log plot of run time for naive sum-product, building the cluster tree, computing\nqueries, updating factors, and restructuring (adding and deleting edges). Although building the clus-\nter tree is slightly more expensive than sum-product, each subsequent update and query is between\n10 and 100 times more ef\ufb01cient than recomputing from scratch.\n\nReferences\n\n[1] Umut A. Acar, Guy E. Blelloch, Matthias Blume, and Kanat Tangwongsan. An experimental analysis of\nself-adjusting computation. In Proceedings of the ACM SIGPLAN Conference on Programming Language\nDesign and Implementation (PLDI), 2006.\n\n[2] Umut A. Acar, Guy E. Blelloch, Robert Harper, Jorge L. Vittes, and Maverick Woo. Dynamizing static\nalgorithms with applications to dynamic trees and history independence. In ACM-SIAM Symposium on\nDiscrete Algorithms (SODA), 2004.\n\n[3] Umut A. Acar, Guy E. Blelloch, and Jorge L. Vittes. An experimental analysis of change propagation in\n\ndynamic trees. In Workshop on Algorithm Engineering and Experimentation (ALENEX), 2005.\n\n[4] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E.\n\nBourne. The protein data bank. Nucl. Acids Res., 28:235\u2013242, 2000.\n\n[5] P. Clifford. Markov random \ufb01elds in statistics. In G. R. Grimmett and D. J. A. Welsh, editors, Disorder\n\nin Physical Systems, pages 19\u201332. Oxford University Press, Oxford, 1990.\n\n[6] A. L. Delcher, A. J. Grove, S. Kasif, and J. Pearl. Logarithmic-time updates and queries in probabilistic\n\nnetworks. Journal of Arti\ufb01cial Intelligence Research, 4:37\u201359, 1995.\n\n[7] R. L. Dunbrack Jr. Rotamer libraries in the 21st century. Curr Opin Struct Biol, 12(4):431\u2013440, 2002.\n[8] M. I. Jordan. Graphical models. Statistical Science, 19:140\u2013155, 2004.\n[9] H. Kamisetty, E. P Xing, and C. J. Langmead. Free energy estimates of all-atom protein structures using\ngeneralized belief propagation. In Proceedings of the 11th Annual International Conference on Research\nin Computational Molecular Biology, 2007. To appear.\n\n[10] F. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Trans-\n\nactions on Information Theory, 47:498\u2013519, 2001.\n\n[11] R. McEliece and S. M. Aji. The generalized distributive law. IEEE Transactions on Information Theory,\n\n46(2):325\u2013343, March 2000.\n\n[12] Marina Meil\u02d8a and Michael I. Jordan. Learning with mixtures of trees. Journal of Machine Learning\n\nResearch, 1(1):1\u201348, October 2000.\n\n[13] Gary L. Miller and John H. Reif. Parallel tree contraction and its application. In Proceedings of the 26th\n\nAnnual IEEE Symposium on Foundations of Computer Science, pages 487\u2013489, 1985.\n\n[14] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kauf-\n\nmann, San Francisco, 1988.\n\n[15] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. Tree consistency and bounds on the performance of the\n\nmax-product algorithm and its generalizations. Statistics and Computing, 14:143\u2013166, April 2004.\n\n[16] S. J. Weiner, P.A. Kollman, D.A. Case, U.C. Singh, G. Alagona, S. Profeta Jr., and P. Weiner. A new\nforce \ufb01eld for the molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc.,\n106:765\u2013784, 1984.\n\n[17] C. Yanover and Y. Weiss. Approximate inference and protein folding. In Proceedings of Neural Informa-\n\ntion Processing Systems Conference, pages 84\u201386, 2002.\n\n8\n\n10210310\u2212310\u2212210\u22121# of nodesTime (sec) Naive sum\u2212productBuildQueryUpdateRestructure\f", "award": [], "sourceid": 663, "authors": [{"given_name": "Ozgur", "family_name": "Sumer", "institution": null}, {"given_name": "Umut", "family_name": "Acar", "institution": null}, {"given_name": "Alexander", "family_name": "Ihler", "institution": null}, {"given_name": "Ramgopal", "family_name": "Mettu", "institution": null}]}