{"title": "Non-stationary dynamic Bayesian networks", "book": "Advances in Neural Information Processing Systems", "page_first": 1369, "page_last": 1376, "abstract": "A principled mechanism for identifying conditional dependencies in time-series data is provided through structure learning of dynamic Bayesian networks (DBNs). An important assumption of DBN structure learning is that the data are generated by a stationary process\u00e2\u0080\u0094an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical models called non-stationary dynamic Bayesian networks, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data.", "full_text": "Non-stationary dynamic Bayesian networks\n\nJoshua W. Robinson and Alexander J. Hartemink\n\nDurham, NC 27708-0129\n\n{josh,amink}@cs.duke.edu\n\nDepartment of Computer Science\n\nDuke University\n\nAbstract\n\nA principled mechanism for identifying conditional dependencies in time-series\ndata is provided through structure learning of dynamic Bayesian networks\n(DBNs). An important assumption of DBN structure learning is that the data are\ngenerated by a stationary process\u2014an assumption that is not true in many impor-\ntant settings. In this paper, we introduce a new class of graphical models called\nnon-stationary dynamic Bayesian networks, in which the conditional dependence\nstructure of the underlying data-generation process is permitted to change over\ntime. Non-stationary dynamic Bayesian networks represent a new framework for\nstudying problems in which the structure of a network is evolving over time. We\nde\ufb01ne the non-stationary DBN model, present an MCMC sampling algorithm for\nlearning the structure of the model from time-series data under different assump-\ntions, and demonstrate the effectiveness of the algorithm on both simulated and\nbiological data.\n\n1 Introduction\n\nStructure learning of dynamic Bayesian networks allows conditional dependencies to be identi\ufb01ed\nin time-series data with the assumption that the data are generated by a distribution that does not\nchange with time (i.e., it is stationary). An assumption of stationarity is adequate in many situations\nsince certain aspects of data acquisition or generation can be easily controlled and repeated. How-\never, other interesting and important circumstances exist where that assumption does not hold and\npotential non-stationarity cannot be ignored.\nAs one example, structure learning of DBNs has been used widely in reconstructing transcriptional\nregulatory networks from gene expression data [1]. But during development, these regulatory net-\nworks are evolving over time, with certain conditional dependencies between gene products be-\ning created as the organism develops, while others are destroyed. As another example, dynamic\nBayesian networks have been used to identify the networks of neural information \ufb02ow that operate\nin the brains of songbirds [2]. However, as the songbird learns from its environment, the networks\nof neural information \ufb02ow are themselves slowly adapting to make the processing of sensory infor-\nmation more ef\ufb01cient. As yet another example, one can use a DBN to model traf\ufb01c \ufb02ow patterns.\nThe roads upon which traf\ufb01c passes do not change on a daily basis, but the dynamic utilization of\nthose roads changes daily during morning rush, lunch, evening rush, and weekends.\nIf one collects time-series data describing the levels of gene products in the case of transcriptional\nregulation, neural activity in the case of neural information \ufb02ow, or traf\ufb01c density in the case of traf\ufb01c\n\ufb02ow, and attempts to learn a DBN describing the conditional dependencies in these time-series, one\ncould be seriously misled if the data-generation process is non-stationary.\nHere, we introduce a new class of graphical model called a non-stationary dynamic Bayesian net-\nwork (nsDBN), in which the conditional dependence structure of the underlying data-generation\n\n1\n\n\fprocess is permitted to change over time. In the remainder of the paper, we introduce and de\ufb01ne the\nnsDBN framework, present a simple but elegant algorithm for ef\ufb01ciently learning the structure of\nan nsDBN from time-series data under different assumptions, and demonstrate the effectiveness of\nthese algorithms on both simulated and experimental data.\n\n1.1 Previous work\n\nIn this paper, we are interested in identifying how the conditional dependencies between time-series\nchange over time; thus, we focus on the task of inferring network structure as opposed to param-\neters of the graphical model. In particular, we are not as interested in making predictions about\nfuture data (such as spam prediction via a na\u00a8\u0131ve Bayes classi\ufb01er) as we are in analysis of collected\ndata to identify non-stationary relationships between variables in multivariate time-series. Here we\ndescribe the few previous approaches to identifying non-stationary networks and discuss the advan-\ntages and disadvantages of each. The model we describe in this paper has none of the disadvantages\nof the models described below primarily because it makes fewer assumptions about the relationships\nbetween variables.\nRecent work modeling the temporal progression of networks from the social networks community\nincludes an extension to the discrete temporal network model [3], in which the the networks are\nlatent (unobserved) variables that generate observed time-series data [4]. Unfortunately, this tech-\nnique has certain drawbacks: the variable correlations remain constant over time, only undirected\nedges can be identi\ufb01ed, and segment or epoch divisions must be identi\ufb01ed a priori.\nIn the continuous domain, some research has focused on learning the structure of a time-varying\nGaussian graphical model [5] with a reversible-jump MCMC approach to estimate the time-varying\nvariance structure of the data. However, some limitations of this method include:\nthe network\nevolution is restricted to changing at most a single edge at a time and the total number of segments is\nassumed known a priori. A similar algorithm\u2014also based on Gaussian graphical models\u2014iterates\nbetween a convex optimization for determining the graph structure and a dynamic programming\nalgorithm for calculating the segmentation [6]. This approach is fast, has no single edge change\nrestriction, and the number of segments is calculated a posteriori; however, it does require that the\ngraph structure is decomposable. Additionally, both of the aforementioned approaches only identify\nundirected edges and assume that the networks in each segment are independent, preventing data\nand parameters from being shared between segments.\n\n2 Brief review of structure learning of Bayesian networks\n\ndistribution of every variable xi can be rewritten asQ\n\nBayesian networks are directed acyclic graphical models that represent conditional dependencies\nbetween variables as edges. They de\ufb01ne a simple decomposition of the complete joint distribution\u2014\na variable is conditionally independent of its non-descendants given its parents. Therefore, the joint\ni P (xi|\u03c0i, \u03b8i), where \u03c0i are the parents of xi,\nand \u03b8i parameterizes the conditional probability distribution between a variable and its parents. The\nposterior probability of a given network G (i.e., the set of conditional dependencies) after having\nobserved data D is estimated via Bayes\u2019 rule: P (G|D) \u221d P (D|G)P (G). The structure prior P (G)\ncan be used to incorporate prior knowledge about the network structure, either about the existence\nof speci\ufb01c edges or the topology more generally (e.g., sparse); if prior information is not available,\nthis is often assumed uniform. The marginal likelihood P (D|G) can be computed exactly, given\na conjugate prior for \u03b8i. When the \u03b8i are independent and multinomially distributed, a Dirichlet\nconjugate prior is used, and the data are complete, the exactly solution for the marginal likelihood\nis the Bayesian-Dirichlet equivalent (BDe) metric [7]. Since we will be modifying it later in this\npaper, we show the expression for the BDe metric here:\n\nP (D|G) =\n\nnY\n\nqiY\n\ni=1\n\nj=1\n\nriY\n\nk=1\n\n\u0393(\u03b1ij)\n\n\u0393(\u03b1ij + Nij)\n\n\u0393(\u03b1ijk + Nijk)\n\n\u0393(\u03b1ijk)\n\n(1)\n\nvariable xi, Nij =Pri\n\nwhere qi is the number of con\ufb01gurations of the parent set \u03c0i, ri is the number of discrete states of\nk=1 Nijk, Nijk is the number of times Xi took on the value k given the parent\ncon\ufb01guration j, and \u03b1ij and \u03b1ijk are Dirichlet hyper-parameters on various entries in \u0398. If \u03b1ijk\nis set everywhere to \u03b1/(qiri), we get a special case of the BDe metric: the uniform BDe metric\n(BDeu).\n\n2\n\n\fGiven a metric for evaluating the marginal likelihood P (D|G), a technique for \ufb01nding the best net-\nwork(s) must be chosen. Heuristic search methods (i.e., simulated annealing, greedy hill-climbing)\nmay be used to \ufb01nd a best network or set of networks. Alternatively, sampling methods may be\nused to estimate a posterior over all networks [8]. If the best network is all that is desired, heuris-\ntic searches will typically \ufb01nd it more quickly than sampling techniques. In settings where many\nmodes are expected, sampling techniques will more accurately capture posterior probabilities re-\ngarding various properties of the network.\nFinally, once a search or sampling strategy has been selected, we must determine how to move\nthrough the space of all networks. A move set de\ufb01nes a set of local traversal operators for moving\nfrom a particular state (i.e., a network) to nearby states. Ideally, the move set includes changes that\nallow posterior modes to be frequently visited. For example, it is reasonable to assume that networks\nthat differ by a single edge will have similar likelihoods. A well designed move set results in fast\nconvergence since less time is spent in the low probability regions of the state space. For Bayesian\nnetworks, the move set is often chosen to be {add an edge, delete an edge, and reverse an edge} [8].\nDBNs are an extension of Bayesian networks to time-series data, enabling cyclic dependencies\nbetween variables to be modeled across time. Structure learning of DBNs is essentially the same\nas described above, except that modeling assumptions are made regarding how far back in time one\nvariable can depend on another (minimum and maximum lag), and constraints need to be placed\non edges so that they do not go backwards in time. For notational simplicity, we assume hereafter\nthat the minimum and maximum lag are both 1. More detailed reviews of structure learning can be\nfound in [9, 10].\n\n3 Learning non-stationary dynamic Bayesian networks\n\nWe would like to extend the dynamic Bayesian network model to account for non-stationarity. In\nthis section, we detail how the structure learning procedure for DBNs to must be changed to account\nfor non-stationarity when learning non-stationary DBNs (nsDBNs).\nAssume that we observe the state of n random variables at N discrete times. Call this multivariate\ntime-series data D, and further assume that it is generated according to a non-stationary process,\nwhich is unknown. The process is non-stationary in the sense that the network of conditional de-\npendencies prevailing at any given time is itself changing over time. We call the initial network of\nconditional dependencies G1 and subsequent networks are called Gi for i = 2, 3, . . . , m. We de\ufb01ne\n\u2206gi to be the set of edges that change (either added or deleted) between Gi and Gi+1. The number\nof edge changes speci\ufb01ed in \u2206gi is Si. We de\ufb01ne the transition time ti to be the time at which Gi\nis replaced by Gi+1 in the data-generation process. We call the period of time between consecu-\ntive transition times\u2014during which a single network of conditional dependencies is operative\u2014an\nepoch. So we say that G1 prevails during the \ufb01rst epoch, G2 prevails during the second epoch, and\nso forth. We will refer to the entire series of prevailing networks as the structure of the nsDBN.\nSince we wish to learn a set of networks instead of one network we must derive a new expression\nfor the marginal likelihood. Assume that there exist m different epochs with m \u2212 1 transition\ntimes T = {t1, . . . , tm\u22121}. The network Gi+1 prevailing in epoch i + 1 differs from network Gi\nprevailing in epoch i by a set of edge changes we call \u2206gi. We would like to determine the sequence\nof networks G1, . . . , Gm that maximize the posterior:\n\nP (G1, . . . , Gm|D, T ) \u221d P (D|G1, . . . , Gm, T )P (G1, . . . , Gm)\n\n\u221d P (D|G1, \u2206g1, . . . , \u2206gm\u22121, T )P (G1, \u2206g1, . . . , \u2206gm\u22121)\n\u221d P (D|G1, \u2206g1, . . . , \u2206gm\u22121, T )P (G1)P (\u2206g1, . . . , \u2206gm\u22121)\n\n(2)\n(3)\n(4)\n\nWe assume the prior over networks can be further split into independent components describing the\ninitial network and subsequent edge changes, as demonstrated in Equation (4). As in the stationary\nsetting, if prior knowledge about particular edges or overall topology is available, an informative\nprior can be placed on G1. In the results reported here, we assume this to be uniform. We do,\nhowever, place some prior assumptions on the ways in which edges change in the structure. First,\nwe assume that the networks evolve smoothly over time. To encode this prior knowledge, we place\ni Si. We also assume\nthat the networks evolve slowly over time (i.e., a transition does not occur at every observation) by\n\nan exponential prior with rate \u03bbs on the total number of edge changes s =P\n\n3\n\n\fplacing another exponential prior with rate \u03bbm on the number of epochs m. The updated posterior\nfor an nsDBN structure is given as:\n\nP (G1, \u2206g1, . . . , \u2206gm\u22121|T ) \u221d P (D|G1, \u2206g1, . . . , \u2206gm\u22121, T ) e\u2212\u03bbsse\u2212\u03bbmm\n\nTo evaluate the new likelihood, we choose to extend the BDe metric because after the parameters\nhave been marginalized away, edges are the only representation of conditional dependencies that\nare left; this provides a useful de\ufb01nition of non-stationarity that is both simple to de\ufb01ne and easy\nto analyze. We will assume that any other sources of non-stationarity are either small enough to\nnot alter edges in the predicted network or large enough to be approximated by edge changes in the\npredicted network.\nIn Equation (1), Nij and Nijk are calculated for a particular parent set over the entire dataset D.\nHowever, in an nsDBN, a node may have multiple parent sets operative at different times. The\ncalculation for Nij and Nijk must therefore be modi\ufb01ed to specify the intervals during which each\nparent set is operative. Note that an interval may be de\ufb01ned over several epochs. Speci\ufb01cally, an\nepoch is de\ufb01ned between adjacent transition times while an interval is de\ufb01ned over the epochs during\nwhich a particular parent set is operative (which may include all epochs).\nFor each node i, the previous parent set \u03c0i in the BDe metric is replaced by a set of parent sets \u03c0ih,\nwhere h indexes the interval Ih during which parent set \u03c0ih is operative for node i. Let pi be the\nnumber of such intervals and let qih be the number of con\ufb01gurations of \u03c0ih. Then we can write:\n\nP (D|G1, . . . , Gm, T ) \u221d nY\n\npiY\n\nqihY\n\nriY\n\n\u0393(\u03b1ij(Ih))\n\n\u0393(\u03b1ijk(Ih) + Nijk(Ih))\n\ni=1\n\nh=1\n\nj=1\n\n\u0393(\u03b1ij(Ih) + Nij(Ih))\n\nk=1\n\n\u0393(\u03b1ijk(Ih))\n\n(5)\n\nwhere the counts Nijk and pseudocounts \u03b1ijk have been modi\ufb01ed to apply only to the data in\neach interval Ih. The modi\ufb01ed BDe metric will be referred to as nsBDe. We have chosen to set\n\u03b1ijk(Ih) = (\u03b1ijk|Ih|)/N (e.g., proportional to the length of the interval during which that particular\nparent set is operative).\nWe use a sampling approach rather than heuristic search because the posterior over structures in-\ncludes many modes. Additionally, sampling allows us to answer questions like \u201cwhat are the most\nlikely transition times?\u201d\u2014a question that would be dif\ufb01cult to answer in the context of heuristic\nsearch.\nBecause the number of possible nsDBN structures is so large (signi\ufb01cantly greater than the number\nof possible DBNs), we must be careful about what options are included in the move set. To achieve\nquick convergence, we want to ensure that every move in the move set ef\ufb01ciently jumps between\nposterior modes. Therefore, the majority of the next section is devoted to describing effective move\nsets under different levels of uncertainty.\n\n4 Different settings regarding the number and times of transitions\n\nAn nsDBN can be identi\ufb01ed under a variety of settings that differ in the level of uncertainty about\nthe number of transitions and whether the transition times are known. The different settings are\nabbreviated according to the type of uncertainty: whether the number of transitions is known (KN)\nor unknown (UN) and whether the transition times themselves are known (KT) or unknown (UT).\nWhen the number and times of transitions are known a priori (KNKT setting), we only need to\nidentify the most likely initial network G1 and sets of edge changes \u2206g1 . . . \u2206gm\u22121. Thus, we wish\nto maximize Equation (4).\nTo create a move set that results in an effectively mixing chain, we consider which types of local\nmoves result in jumps between posterior modes. As mentioned earlier, structures that differ by\na single edge will probably have similar likelihoods. Additionally, structures that have slightly\ndifferent edge change sets will have similar likelihoods. The add edge, remove edge, add to edge\nset, remove from edge set, and move from edge set moves are listed as (M1) \u2212 (M5) in Table 1 in\nthe Appendix.\nKnowing in advance the times at which all the transitions occur is often unrealistic. When the\nnumber of transitions is known but the times are unknown a priori (KNUT setting), the transition\ntimes T must also be estimated a posteriori.\n\n4\n\n\fFigure 1: Structure learning of nsDBNs under several settings. A. True non-stationary data-\ngeneration process. Under the KNKT setting, the recovered structure is exactly this one. B. Under\nthe KNUT setting, the algorithm learns the model-averaged nsDBN structure shown. C: Posterior\nprobabilities of transition times when learning an nsDBN in the UNUT setting (with \u03bbs = 1 and\n\u03bbm = 5). The blue triangles represent the true transition times and the red dots represent one stan-\ndard deviation from the mean probability obtained from several runs. D: Posterior probabilities of\nthe number of epochs.\n\ni \u223c DU(ti \u2212 d, ti + d) with the constraint that ti\u22121 < t0\n\nStructures with the same edge sets but slightly different transition times will probably have similar\nlikelihoods. Therefore, we can add a new move that proposes a local shift to one of the transition\ntimes: let d be some small positive integer and let the new time t0\ni be drawn from a discrete uniform\ndistribution t0\ni < ti+1. Initially, we set the\nm \u2212 1 transition times so that the epochs are roughly equal in length. The complete move set for\nthis setting includes all of the moves described previously as well as the new local shift move, listed\nas (M6) in Table 1 in the Appendix.\nFinally, when the number and times of transitions are unknown (UNUT setting), both m and T must\nbe estimated. While this is the most interesting setting, it is also the most dif\ufb01cult since one of\nthe unknowns is the number of unknowns. Using the reversible jump Markov chain Monte Carlo\nsampling technique [11], we can further augment the move set to allow for the number of transitions\nto change. Since the number of epochs m is allowed to vary, this is the only setting that incorporates\nthe prior on m.\nTo allow the number of transitions to change during sampling, we introduce merge and split op-\nerations to the move set. For the merge operation, two adjacent edge sets (\u2206gi and \u2206gi+1) are\ncombined to create a new edge set. The transition time of the new edge set is selected to be the mean\nof the previous locations weighted by the size of each edge set: t0\ni = (Siti + Si+1ti+1)/(Si + Si+1).\nFor the split operation, an edge set \u2206gi is randomly chosen and randomly partitioned into two new\nedge sets \u2206g0\ni+1 with all subsequent edge sets re-indexed appropriately. Each new transi-\ntion time is selected as described above. The move set is completed with the inclusion of the add\ntransition time and delete transition time operations. These moves are similar to the split and merge\noperations except they also increase or decrease s, the total number of edge changes in the structure.\nThe four additional moves are listed as (M7) \u2212 (M10) in Table 1 in the Appendix.\n\ni and \u2206g0\n\n5 Results on simulated data\n\nTo evaluate the effectiveness of our method, we \ufb01rst apply it to a small, simulated dataset. The\n\ufb01rst experiment is on a simulated ten node network with six single-edge changes between seven\n\n5\n\n\fepochs where the length of each epoch varies between 20 and 400 observations. The true network\nis shown in Figure 1A. For each of the three settings, we generate ten individual datasets and then\ncollect 250,000 samples from each, with the \ufb01rst 50,000 samples thrown out for burn-in. We repeat\nthe sample collection 25 times for each dataset to obtain variance estimates on posterior quantities\nof interest. The sample collection takes about 25 seconds for each dataset on a 3.6GHz dual-core\nIntel Xeon machine with 4 GB of RAM, but all runs can easily be executed in parallel. To obtain a\nconsensus (model averaged) structure prediction, an edge is considered present at a particular time\nif the posterior probability of the edge is greater than 0.5.\nIn the KNKT setting, the sampler rapidly converges to the correct solution. The value of \u03bbm has\nno effect in this setting, and the value of \u03bbs is varied between 0.1 and 50. The predicted structure\nis identical to the true structure shown in Figure 1A for a broad range of values: 0.5 \u2264 \u03bbs \u2264 10.0,\nindicating robust and accurate learning.\nIn the KNUT setting, transition times are unknown and must be estimated a posteriori. The value\nof \u03bbm still has no effect in this setting and the value of \u03bbs is again varied between 0.1 and 50. The\npredicted consensus structure is shown in Figure 1B for \u03bbs = 5.0; this choice of \u03bbs provides the\nmost accurate predictions.\nThe estimated structure and transition times are very close to the truth. All edges are correct, with\nthe exception of two missing edges in G1, and the predicted transition times are all within 10 of the\ntrue transition times. We also discovered that the convergence rate under the KNUT and the KNKT\nsettings were very similar for a given m. This implies that the posterior over transition times is quite\nsmooth; therefore, the mixing rate is not greatly affected when sampling transition times. Finally,\nwe consider the UNUT setting, when the number and times of transitions are both unknown.\nWe use the range 1 \u2264 \u03bbs \u2264 5 because we know from the previous settings that the most accurate\nsolutions were obtained from a prior within this range; the range 1 \u2264 \u03bbm \u2264 50 is selected to provide\na wide range of estimates for the prior since we have no a priori knowledge of what it should be.\nWe can examine the posterior probabilities of transition times over all sampled structures, shown\nin Figure 1C. Highly probable transition times correspond closely with the true transition times\nindicated by blue triangles; nevertheless, some uncertainty exists on about the exact locations of t3\nand t4 since the fourth epoch is exceedingly short. We can also examine the posterior number of\nepochs, shown in Figure 1D. The most probable posterior number of epochs is six, close to the true\nnumber of seven.\nTo identify the best parameter settings for \u03bbs and \u03bbm, we examine the best F1-measure (the harmonic\nmean of the precision and recall) for each. The best F1-measure of 0.992 is obtained when \u03bbs = 5\nand \u03bbm = 1, although nearly all choices result in an F1-measure above 0.90 (see Appendix).\nTo evaluate the scalability of our technique, we also simulated data from a 100 variable network\nwith an average of \ufb01fty edges over \ufb01ve epochs spanning 4800 observations, with one to three edges\nchanging between each epoch. Learning nsDBNs on this data for \u03bbs \u2208 {1, 2, 5} and \u03bbm \u2208 {2, 3, 5}\nresults in F1-measures above 0.93, with the \u03bbs = 1 and \u03bbm = 5 assignments to be best for this data,\nwith an F1-measure of 0.953.\n\n6 Results on Drosophila muscle development gene expression data\n\nWe also apply our method to identify non-stationary networks using Drosophila development gene\nexpression data from [12]. This data contains expression measurements over 66 time steps of 4028\nDrosophila genes throughout development and growth during the embryonic, larval, pupal, and adult\nstages of life. Using a subset of the genes involved in muscle development, some researchers have\nidenti\ufb01ed a single directed network [13], while others have learned a time-varying undirected net-\nwork [4]. To facilitate comparison with as many existing methods as possible, we apply our method\nto the same data. Unfortunately, no other techniques predict non-stationary directed networks, so\nour prediction in Figure 2C is compared to the stationary directed network in Figure 2A and the\nnon-stationary undirected network in Figure 2B.\nWhile all three predictions share many edges, certain similarities between our prediction and one\nor both of the other two predictions are of special interest. In all three predictions, a cluster seems\nto form around myo61f, msp-300, up, mhc, prm, and mlc1. All of these genes except up are in the\n\n6\n\n\fFigure 2: Learning nsDBNS from the Drosophila muscle development data. A. The directed net-\nwork reported by [13]. B. The undirected networks reported by [4]. C. The nsDBN structure learned\nunder the KNKT setting with \u03bbs = 2.0. Only the edges that occurred in greater than 50 percent of\nthe samples are shown, with thicker edges representing connections that occurred more frequently.\nD. Posterior probabilities of transition times using \u03bbm = \u03bbs = 2 under the UNUT setting. Blue\ntriangles represent the borders of embryonic, larval, pupal, and adult stages. E. Posterior probability\nof the number of epochs under the UNUT setting.\n\nmyosin family, which contains genes involved in muscle contraction. Within the directed predictions,\nmsp-300 primarily serves as a hub gene that regulates the other myosin family genes. It is interesting\nto note that the undirected method predicts connections between mcl1, prm, and mhc while neither\ndirected method make these predictions. Since msp-300 seems to serve as a regulator to these genes,\nthe method from [4] may be unable to distinguish between direct interactions and correlations due\nto its undirected nature.\nDespite the similarities, some notable differences exist between our prediction and the other two\npredictions. First, we predict interactions from myo61f to both prm and up, neither of which is\npredicted in the other methods, suggesting a greater role for myo61f during muscle development.\nAlso, we do not predict any interactions with twi. During muscle development in Drosophila, twi\nacts as a regulator of mef2 that in turn regulates some myosin family genes, including mlc1 and\nmhc [14]; our prediction of no direct connection from twi mirrors this biological behavior. Finally,\nwe note that in our predicted structure, actn never connects as a regulator (parent) to any other\ngenes, unlike in the network in Figure 2A. Since actn (actinin) only binds actin, we do not expect it\nto regulate other muscle development genes, even indirectly.\nWe can also look at the posterior probabilities of transition times and epochs under the UNUT\nsetting. These plots are shown in Figure 2D and 2E, respectively. The transition times with high\nposterior probabilities correspond well to the embryonic\u2192larval and the larval\u2192pupal transitions,\nbut a posterior peak occurs well before the supposed time of the pupal\u2192adult transition; this reveals\nthat the gene expression program governing the transition to adult morphology is active well before\nthe \ufb02y emerges from the pupa, as would clearly be expected. Also, we see that the most probable\nnumber of epochs is three or four, mirroring closely the total number of developmental stages.\nSince we could not biologically validate the \ufb02y network, we generated a non-stationary time-series\nwith the same number of nodes and a similar level of connectivity to evaluate the accuracy a re-\ncovered nsDBN on a problem of exactly this size. We generated data from an nsDBN with 66\nobservations and transition times at 30, 40, and 58 to mirror the number of observations in embry-\nonic, larval, pupal, and adult stages of the experimental \ufb02y data. Since it is dif\ufb01cult to estimate the\namount of noise in the experimental data, we simulated noise at 1:1 to 4:1 signal-to-noise ratios.\n\n7\n\n\fFinally, since many biological processes have more variables than observations, we examined the\neffect of increasing the number of experimental replicates. We found that the best F1-measures\n(greater than 0.75 across all signal-to-noise ratios and experimental replicates) were obtained when\n\u03bbm = \u03bbs = 2, which is why we used those values to analyze the Drosophila muscle network data.\n\n7 Discussion\n\nNon-stationary dynamic Bayesian networks provide a useful framework for learning Bayesian net-\nworks when the generating processes are non-stationary. Using the move sets described in this\npaper, nsDBN learning is ef\ufb01cient even for networks of 100 variables, generalizable to situations of\nvarying uncertainty (KNKT, KNUT, and UNUT), and the predictions are stable over many choices\nof hyper-parameters. Additionally, by using a sampling-based approach, our method allows us to\nassess a con\ufb01dence for each predicted edge\u2014an advantage that neither [13] nor [4] share.\nWe have demonstrated the feasibility of learning an nsDBN in all three settings using simulated data,\nand in the KNKT and UNUT settings using real biological data. Although the predicted \ufb02y muscle\ndevelopment networks are dif\ufb01cult to verify, simulated experiments of a similar scale demonstrate\nhighly accurate predictions, even with noisy data and few replicates.\nNon-stationary DBNs offer all of the advantages of DBNs (identifying directed non-linear interac-\ntions between multivariate time-series) and are additionally able to identify non-stationarities in the\ninteractions between time-series. In future work, we hope to analyze data from other \ufb01elds that\nhave traditionally used dynamic Bayesian networks and instead use nsDBNs to identify and model\npreviously unknown or uncharacterized non-stationary behavior.\n\nReferences\n[1] Nir Friedman, Michal Linial, Iftach Nachman, and Dana Pe\u2019er. Using Bayesian networks to analyze\n\nexpression data. In RECOMB 4, pages 127\u2013135. ACM Press, 2000.\n\n[2] V. Anne Smith, Jing Yu, Tom V. Smulders, Alexander J. Hartemink, and Erich D. Jarvis. Computational\n\ninference of neural information \ufb02ow networks. PLoS Computational Biology, 2(11):1436\u20131449, 2006.\n\n[3] Steve Hanneke and Eric P. Xing. Discrete temporal models of social networks. In Workshop on Statistical\n\nNetwork Analysis, ICML 23, 2006.\n\n[4] Fan Guo, Steve Hanneke, Wenjie Fu, and Eric P. Xing. Recovering temporally rewiring networks: A\n\nmodel-based approach. In ICML 24, 2007.\n\n[5] Makram Talih and Nicolas Hengartner. Structural learning with time-varying components: Tracking the\n\ncross-section of \ufb01nancial time series. Journal of the Royal Statistical Society B, 67(3):321\u2013341, 2005.\n\n[6] Xiang Xuan and Kevin Murphy. Modeling changing dependency structure in multivariate time series. In\n\nICML 24, 2007.\n\n[7] David Heckerman, Dan Geiger, and David Maxwell Chickering. Learning Bayesian networks: The com-\n\nbination of knowledge and statistical data. Machine Learning, 20(3):197\u2013243, 1995.\n\n[8] Claudia Tarantola. MCMC model determination for discrete graphical models. Statistical Modelling,\n\n4(1):39\u201361, 2004.\n\n[9] P Krause. Learning probabilistic networks. The Knowledge Engineering Review, 13(4):321\u2013351, 1998.\n[10] Kevin Murphy. Learning Bayesian network structure from sparse data sets. U.C. Berkeley Technical\n\nReport, Computer Science Department 990, University of California at Berkeley, 2001.\n\n[11] Peter J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determina-\n\ntion. Biometrika, 82(4):711\u2013732, 1995.\n\n[12] M Arbeitman, E Furlong, F Imam, E Johnson, B Null, B Baker, M Krasnow, M Scott, R Davis, and\nK White. Gene expression during the life cycle of Drosophila melanogaster. Science, 5590(297):2270\u2013\n2275, 2002.\n\n[13] Wentao Zhao, Erchin Serpedin, and Edward R. Dougherty. Inferring gene regulatory networks from time\n\nseries data using the minimum description length principle. Bioinformatics, 22(17):2129\u20132135, 2006.\n\n[14] T Sandmann, L Jensen, J Jakobsen, M Karzynski, M Eichenlaub, P Bork, and E Furlong. A temporal map\nof transcription factor activity: mef2 directly regulates target genes at all stages of muscle development.\nDevelopmental Cell, 10(6):797\u2013807, 2006.\n\n8\n\n\f", "award": [], "sourceid": 450, "authors": [{"given_name": "Joshua", "family_name": "Robinson", "institution": null}, {"given_name": "Alexander", "family_name": "Hartemink", "institution": null}]}