{"title": "Using Social Dynamics to Make Individual Predictions: Variational Inference with a Stochastic Kinetic Model", "book": "Advances in Neural Information Processing Systems", "page_first": 2783, "page_last": 2791, "abstract": "Social dynamics is concerned primarily with interactions among individuals and the resulting group behaviors, modeling the temporal evolution of social systems via the interactions of individuals within these systems. In particular, the availability of large-scale data from social networks and sensor networks offers an unprecedented opportunity to predict state-changing events at the individual level. Examples of such events include disease transmission, opinion transition in elections, and rumor propagation. Unlike previous research focusing on the collective effects of social systems, this study makes efficient inferences at the individual level. In order to cope with dynamic interactions among a large number of individuals, we introduce the stochastic kinetic model to capture adaptive transition probabilities and propose an efficient variational inference algorithm the complexity of which grows linearly \u2014 rather than exponentially\u2014 with the number of individuals. To validate this method, we have performed epidemic-dynamics experiments on wireless sensor network data collected from more than ten thousand people over three years. The proposed algorithm was used to track disease transmission and predict the probability of infection for each individual. Our results demonstrate that this method is more efficient than sampling while nonetheless achieving high accuracy.", "full_text": "Using Social Dynamics to Make Individual Predictions:\nVariational Inference with a Stochastic Kinetic Model\n\nZhen Xu, Wen Dong, and Sargur Srihari\n\nDepartment of Computer Science and Engineering\n\nUniversity at Buffalo\n\n{zxu8,wendong,srihari}@buffalo.edu\n\nAbstract\n\nSocial dynamics is concerned primarily with interactions among individuals and the\nresulting group behaviors, modeling the temporal evolution of social systems via\nthe interactions of individuals within these systems. In particular, the availability of\nlarge-scale data from social networks and sensor networks offers an unprecedented\nopportunity to predict state-changing events at the individual level. Examples\nof such events include disease transmission, opinion transition in elections, and\nrumor propagation. Unlike previous research focusing on the collective effects\nof social systems, this study makes ef\ufb01cient inferences at the individual level. In\norder to cope with dynamic interactions among a large number of individuals, we\nintroduce the stochastic kinetic model to capture adaptive transition probabilities\nand propose an ef\ufb01cient variational inference algorithm the complexity of which\ngrows linearly \u2014 rather than exponentially\u2014 with the number of individuals.\nTo validate this method, we have performed epidemic-dynamics experiments on\nwireless sensor network data collected from more than ten thousand people over\nthree years. The proposed algorithm was used to track disease transmission and\npredict the probability of infection for each individual. Our results demonstrate\nthat this method is more ef\ufb01cient than sampling while nonetheless achieving high\naccuracy.\n\n1\n\nIntroduction\n\nThe \ufb01eld of social dynamics is concerned primarily with interactions among individuals and the\nresulting group behaviors. Research in social dynamics models the temporal evolution of social\nsystems via the interactions of the individuals within these systems [9]. For example, opinion\ndynamics can model the opinion state transitions of an entire population in an election scenario [3],\nand epidemic dynamics can predict disease outbreaks ahead of time [10]. While traditional social-\ndynamics models focus primarily on the macroscopic effects of social systems, often we instead\nwish to know the answers to more speci\ufb01c questions. Given the movement and behavior history\nof a subject with Ebola, can we tell how many people should be tested or quarantined? City-size\nquarantine is not necessary, but family-size quarantine is insuf\ufb01cient. We aim to model a method to\nevaluate the paths of illness transmission and the risks of infection for individuals, so that limited\nmedical resources can be most ef\ufb01ciently distributed.\nThe rapid growth of both social networks and sensor networks offers an unprecedented opportunity\nto collect abundant data at the individual level. From these data we can extract temporal interactions\namong individuals, such as meeting or taking the same class. To take advantage of this opportu-\nnity, we model social dynamics from an individual perspective. Although such an approach has\nconsiderable potential, in practice it is dif\ufb01cult to model the dynamic interactions and handle the\ncostly computations when a large number of individuals are involved. In this paper, we introduce an\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fevent-based model into social systems to characterize their temporal evolutions and make tractable\ninferences on the individual level.\nOur research on the temporal evolutions of social systems is related to dynamic Bayesian networks\nand continuous time Bayesian networks [13, 18, 21]. Traditionally, a coupled hidden Markov model\nis used to capture the interactions of components in a system [2], but this model does not consider\ndynamic interactions. However, a stochastic kinetic model is capable of successfully describing the\ninteractions of molecules (such as collisions) in chemical reactions [12, 22], and is widely used in\nmany \ufb01elds such as chemistry and cell biology [1, 11]. We introduce this model into social dynamics\nand use it to focus on individual behaviors.\nA challenge in capturing the interactions of individuals is that in social dynamics the state space grows\nexponentially with the number of individuals, which makes exact inference intractable. To resolve\nthis we must apply approximate inference methods. One class of these involves sampling-based\nmethods. Rao and Teh introduce a Gibbs sampler based on local updates [20], while Murphy and\nRussell introduce Rao-Blackwellized particle \ufb01ltering for dynamic Bayesian networks [17]. However,\nsampling-based methods sometimes mix slowly and require a large number of samples/particles. To\ndemonstrate this issue, we offer empirical comparisons with two major sampling methods in Section\n4. An alternative class of approximations is based on variational inference. Opper and Sanguinetti\napply the variational mean \ufb01eld approach to factor a Markov jump process [19], and Cohn and El-Hay\nfurther improve its ef\ufb01ciency by exploiting the structure of the target network [4]. A problem is that\nin an event-based model such as a stochastic kinetic model (SKM), the variational mean \ufb01eld is not\napplicable when a single event changes the states of two individuals simultaneously. Here, we use a\ngeneral expectation propagation principle [14] to design our algorithm.\nThis paper makes three contributions: First, we introduce the discrete event model into social\ndynamics and make tractable inferences on both individual behaviors and collective effects. To this\nend, we apply the stochastic kinetic model to de\ufb01ne adaptive transition probabilities that characterize\nthe dynamic interaction patterns in social systems. Second, we design an ef\ufb01cient variational inference\nalgorithm whose computation complexity grows linearly with the number of individuals. As a result,\nit scales very well in large social systems. Third, we conduct experiments on epidemic dynamics to\ndemonstrate that our algorithm can track the transmission of epidemics and predict the probability of\ninfection for each individual. Further, we demonstrate that the proposed method is more ef\ufb01cient\nthan sampling while nonetheless achieving high accuracy.\nThe remainder of this paper is organized as follows. In Section 2, we brie\ufb02y review the coupled hidden\nMarkov model and the stochastic kinetic model. In Section 3, we propose applying a variational\nalgorithm with the stochastic kinetic model to make tractable inferences in social dynamics. In\nSection 4, we detail empirical results from applying the proposed algorithm to our epidemic data\nalong with the proximity data collected from sensor networks. Section 5 concludes.\n\n2 Background\n\n, . . . , x(M )\n\nt\n\nt\n\n) de\ufb01nes the hidden states of all HMMs at time t, and x(m)\n\n2.1 Coupled Hidden Markov Model\nA coupled hidden Markov model (CHMM) captures the dynamics of a discrete time Markov process\nthat joins a number of distinct hidden Markov models (HMMs), as shown in Figure 2.1(a). xt =\n(x(1)\nis the hidden state of\nHMM m at time t. yt = (y(1)\nis\nthe observation of HMM m at time t. P (xt|xt1) are transition probabilities, and P (yt|xt) are\nemission probabilities for CHMM. Given hidden states, all observations are independent. As such,\nP (yt|xt) =Qm P (y(m)\n) is the emission probability for HMM m at\n\ntime t. The joint probability of CHMM can be de\ufb01ned as follows:\n\n) are observations of all HMMs at time t, and y(m)\n\n), where P (y(m)\n\n, . . . , y(M )\n\nt\n\n|x(m)\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nP (x1,...,T , y1,...,T ) =\n\nP (xt|xt1)P (yt|xt).\n\n(1)\n\nFor a CHMM that contains M HMMs in a binary state, the state space is 2M, and the state transition\nkernel is a 2M \u21e5 2M matrix. In order to make exact inferences, the classic forward-backward\nalgorithm sweeps a forward/\ufb01ltering pass to compute the forward statistics \u21b5t(xt) = P (xt|y1,...,t)\n\nt\n\n|x(m)\nTYt=1\n\n2\n\n\fHMM 1\n...\n\nHMM 2\n...\n\nHMM 3\n...\n\nx1,t-1\n\ny1,t-1\n\nx2,t-1\n\ny2,t-1\n\nx3,t-1\n\ny3,t-1\n\nt-1\n\nx1,t\n\ny1,t\n\nx2,t\n\ny2,t\n\nx3,t\n\ny3,t\n\nt\n\nTime\n(a)\n\nx1,t+1\n\n...\n\n...\n\n...\n\ny1,t+1\n\nx2,t+1\n\ny2,t+1\n\nx3,t+1\n\ny3,t+1\n\nt+1\n\nHMM 1\n...\n\nHMM 2\n...\n\nHMM 3\n...\n\nx1,t-1\n\ny1,t-1\n\nx2,t-1\n\ny2,t-1\n\nx3,t-1\n\ny3,t-1\n\nt-1\n\nvt\n\nvt+1\n\nx1,t\n\ny1,t\n\nx2,t\n\ny2,t\n\nx3,t\n\ny3,t\n\nt\n\nTime\n(b)\n\nx1,t+1\n\n...\n\n...\n\n...\n\ny1,t+1\n\nx2,t+1\n\ny2,t+1\n\nx3,t+1\n\ny3,t+1\n\nt+1\n\nFigure 1: Illustration of (a) Coupled Hidden Markov Model, (b) Stochastic Kinetic Model.\n\nand a backward/smoothing pass to estimate the backward statistics t(xt) = P (yt+1,...,T |xt)\nP (yt+1,...,T |y1,...,t).\nThen it can estimate the one-slice statistics t(xt) = P (xt|y1,...,T ) = \u21b5t(xt)t(xt) and two-slice\nstatistics \u21e0t(xt1, xt) = P (xt1, xt|y1,...,T ) = \u21b5t1(xt1)P (xt|xt1)P (yt|xt)t(xt)\n. Its complexity\ngrows exponentially with the number of HMM chains. In order to make tractable inferences, certain\nfactorizations and approximations must be applied. In the next section, we introduce a stochastic\nkinetic model to lower the dimensionality of transition probabilities.\n\nP (yt|y1,...,t1)\n\n2.2 The Stochastic Kinetic Model\nA stochastic kinetic model describes the temporal evolution of a chemical system with M species\nX = {X1, X2,\u00b7\u00b7\u00b7 , XM} driven by V events (or chemical reactions) parameterized by rate constants\nc = (c1, . . . , cV ). An event (chemical reaction) k has a general form as follows:\n\nr1X1 + \u00b7\u00b7\u00b7 + rM XM\n\nck! p1X1 + \u00b7\u00b7\u00b7 + pM XM .\n\nThe species on the left are called reactants, and rm is the number of mth reactant molecules consumed\nduring the reaction. The species on the right are called products, and pm is the number of mth product\nmolecules produced in the reaction. Species involved in the reaction (rm > 0) without consumption\nor production (rm = pm) are called catalysts. At any speci\ufb01c time t, the populations of the species\nis xt = (x(1)\n). An event k happens with rate hk(xt, ck), determined by the rate constant\nand the current population state [22]:\n\n, . . . , x(M )\n\nt\n\nt\n\nhk(xt, ck) =ckgk(xt) = ck\n\ng(m)\nk\n\n(x(m)\n\nt\n\n).\n\n(2)\n\nt\n\nk\n\n(x(m)\n\nIn our case, we adopt\n\nThe form of gk(xt) depends on the reaction.\n\nthe product form\nQM\nm=1 g(m)\n), which represents the total number of ways that reactant molecules can be selected\nto trigger event k [22]. Event k changes the populations by k = xt xt1. The probability that\nevent k will occur during time interval (t, t + dt] is hk(xt, ck)dt. We assume at each discrete time\nstep that no more than one event will occur. This assumption follows the linearization principle in the\nliterature [18], and is valid when the discrete time step is small. We treat each discrete time step as a\nunit of time, so that hk(xt, ck) represents the probability of an event.\nIn epidemic modeling, for example, an infection event vi has the form S + I ci! 2I, such that a\nsusceptible individual (S) is infected by an infectious individual (I) with rate constant ci. If there is\nonly one susceptible individual (type m = 1) and one infectious individual (type m = 2) involved in\nthis event, hi(xt, ci) = ci, i = [1 1]T and P (xt xt1 = i) = P (xt|xt1, vi) = ci.\nIn a traditional hidden Markov model, the transition kernel is typically \ufb01xed. In comparison, SKM\nis better at capturing dynamic interactions in terms of the events with rates dependent on reactant\npopulations, as shown in Eq.(2).\n\nMYm=1\n\n3\n\n\f3 Variational Inference with the Stochastic Kinetic Model\nIn this section, we de\ufb01ne the likelihood of the entire sequence of hidden states and observations for\nan event-based model, and derive a variational inference algorithm and parameter-learning algorithm.\n3.1 Likelihood for Event-based Model\nIn social dynamics, we use a discrete time Markov model to describe the temporal evolutions of a set\nof individuals x(1), . . . , x(M ) according to a set of V events. To cope with dynamic interactions, we\nintroduce the SKM and express the state transition probabilities in terms of event probabilities, as\nshown in Figure 2.1(b). We assume at each discrete time step that no more than one event will occur.\nLet v1, . . . , vT be a sequence of events, x1, . . . , xT a sequence of hidden states, and y1, . . . , yT a\nset of observations. Similar to Eq.(1), the likelihood of the entire sequence is as follows:\n\nTYt=1\n\nP (x1,...,T , y1,...,T , v1,...,T ) =\n\nP (xt, vt|xt1) =\u21e2ck \u00b7 gk (xt1) \u00b7 (xt xt1 \u2318 k)\n\nP (xt, vt|xt1)P (yt|xt), where\n(1 Pk ckgk (xt1)) \u00b7 (xt xt1 \u2318 0)\n\nif vt = k\nif vt = ;\n\n(3)\n\n.\n\nP (xt, vt|xt1) is the event-based transition kernel. (xt xt1 \u2318 k) is 1 if the previous state\nis xt1 and the current state is xt = xt1 + k, and 0 otherwise. k is the effect of event vk. ;\nrepresents an auxiliary event, meaning that there is no event. Substituting the product form of gk, the\ntransition kernel can be written as follows:\n\ng(m)\nk\n\nP (xt, vt = k|xt1) = ckYm\nP (xt, vt = ;|xt1) = (1 Xk\nt x(m)\nt = x(m)\n\nk\n\n(x(m)\n\nt1) \u00b7Ym\nckYm\n\ng(m)\nk\n\n),\n\nk\n\n(x(m)\n\nt1 \u2318 (m)\nt x(m)\nt1)) \u00b7Ym\n(x(m)\nt x(m)\n\n(x(m)\n\nt1 \u2318 0),\n\n(4)\n\n(5)\n\nk\n\nt1 \u2318 (m)\nt1 + (m)\n\n) is 1 if the previous state of an individual m is x(m)\n, and 0 otherwise.\n\nwhere (x(m)\nstate is x(m)\n3.2 Variational Inference for Stochastic Kinetic Model\nAs noted in Section 2.1, exact inference in social dynamics is intractable due to the formidable state\nspace. However, we can approximate the posterior distribution P (x1,...,T , v1,...,T|y1,...,T ) using an\napproximate distribution within the exponential family. The inference algorithm minimizes the KL\ndivergence between these two distributions, which can be formulated as an optimization problem [14]:\n\nt1 and the current\n\nMinimize: Xt,xt1,xt,vt\nSubject to: Xvt,xt1,{xt\\x(m)\nXvt,{xt1\\x(m)\nXx(m)\n\n\u02c6(m)\nt\n\nt1},xt\n(x(m)\n\nt\n\nt\n\nt\n\n}\n\n\u02c6\u21e0t(xt1, xt, vt) \u00b7 log\n\n\u02c6\u21e0t(xt1, xt, vt)\n\nP (xt, vt|xt1)P (yt|xt)\n(x(m)\n\n\u02c6(m)\nt\n\nt\n\nXt,xtYm\n\n), for all t, m, x(m)\n\n,\n\nt\n\n) logYm\n\n(6)\n\n\u02c6(m)\nt\n\n(x(m)\n\nt\n\n)\n\n\u02c6\u21e0t(xt1, xt, vt) = \u02c6(m)\n\nt\n\n(x(m)\n\nt\n\n\u02c6\u21e0t(xt1, xt, vt) = \u02c6(m)\n\nt1 (x(m)\n\nt1), for all t, m, x(m)\nt1,\n\n) = 1, for all t, m.\n\nThe objective function is the Bethe free energy, composed of average energy and Bethe entropy\napproximation [23]. \u02c6\u21e0t(xt1, xt, vt) is the approximate two-slice statistics and \u02c6(m)\n) is the\napproximate one-slice statistics for each individual m. They form the approximate distribution over\nwhich to minimize the Bethe free energy. ThePt,xt1,xt,vt is an abbreviation for summing over\nt, xt1, xt, and vt. P{xt\\x(m)\n\n. We use similar\nabbreviations below. The \ufb01rst two sets of constraints are marginalization conditions, and the third\n\nis the sum over all individuals in xt except x(m)\n\n(x(m)\n\n}\n\nt\n\nt\n\nt\n\nt\n\n4\n\n\fis normalization conditions. To solve this constrained optimization problem, we \ufb01rst de\ufb01ne the\nLagrange function using Lagrange multipliers to weight constraints, then take the partial derivatives\nwith respect to \u02c6\u21e0t(xt1, xt, vt), and \u02c6(m)\n). The dual problem is to \ufb01nd the approximate forward\nstatistics \u02c6\u21b5(m)\n) in order to maximize the pseudo-likelihood\nfunction. The duality is between minimizing Bethe free energy and maximizing pseudo-likelihood.\nThe \ufb01xed-point solution for the primal problem is as follows1:\n\nt1) and backward statistics \u02c6(m)\n\nt1(x(m)\n\n(x(m)\n\n(x(m)\n\nt\n\nt\n\nt\n\nt\n\n\u02c6\u21e0t(x(m)\n\nt1, x(m)\n\nt\n\n, vt) =\n\n1\n\nZtXm06=m,x(m0)\n\nP (xt,vt|xt1)\u00b7Qm \u02c6\u21b5(m)\n\nt1 ,x(m0)\n\nt\n\nt1(x(m)\n\nt1)\u00b7Qm P (y(m)\n\nt\n\n|x(m)\n\nt\n\n)\u00b7Qm\n\n\u02c6(m)\nt\n\n(x(m)\n\nt\n\n).\n\n(7)\n\nt\n\nt1, x(m)\n\n\u02c6\u21e0t(x(m)\n, vt) is the two-slice statistics for an individual m, and Zt is the normalization constant.\nGiven the factorized form of P (xt, vt|xt1) in Eqs. (4) and (5), everything in Eq. (7) can be written\nin a factorized form. After reformulating the term relevant to the individual m, \u02c6\u21e0t(x(m)\n, vt)\ncan be shown neatly as follows:\n1\nZt\n\nt1)P (y(m)\n\nt1, x(m)\n\n\u02c6P (x(m)\n\n(x(m)\n\n(8)\n\nt\n\nt\n\nt\n\nt\n\n, vt) =\n\n\u02c6\u21e0t(x(m)\n\nt1, x(m)\n\n, vt|x(m)\nwhere the marginalized transition kernel \u02c6P (x(m)\nt1) Ym06=m\n\u02c6P (x(m)\n\n(x(m)\n\nk\n\nt\n\n\u02c6P (x(m)\n\nt\n\nckg(m)\n\nk\n\n(x(m)\n\nt1) = ckg(m)\nt1) = (1 Xk\n\n, vt = k|x(m)\n, vt = ;|x(m)\n\u21b5(m0)\nt1 (x(m0)\n(m0)\n(m0)\nx\nt1 \u2318\nt\n\u21b5t1(x(m0)\nt1 )P (y(m0)\n(m0)\n(m0)\nt1 \u23180\nt\n\n|x(m0)\n\n(m0)\nk\n\nx\n\nt\n\nt\n\nt\n\nt\n\nt\n\n) \u02c6(m)\n\n|x(m)\n\nt1(x(m)\nt1) for the individual m can be de\ufb01ned as:\n(9)\n\n),\n\n),\n\nt\n\nt\n\nt\n\nk\n\nt1) \u00b7 \u02c6\u21b5(m)\n, vt|x(m)\n\u02dcg(m0)\nt x(m)\nk,t1 \u00b7 (x(m)\nt1) Ym06=m\n\u02c6g(m0)\nk,t1)(x(m)\nt1 )Px\n\u21b5(m0)\nt1 (x(m0)\n(m0)\n(m0)\nx\nt1 \u23180\nt\nt1 )Px\nt1 (x(m0)\n\u21b5(m0)\nt1 )P (y(m0)\n(m0)\n(m0)\nx\nt1 \u23180\nt\n\nt1 \u2318 (m)\nt x(m)\nt1 )P (y(m0)\n\n(x(m0)\n\n(x(m0)\n\n)g(m0)\n\n)g(m0)\n\nk\n\nt\n\nt1 \u2318 0),\n|x(m0)\n\nt\n\nt\n\n(10)\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nk\n\nk\n\nk\n\n),\n\n),\n\n\u02c6g(m0)\n\n\u02dcg(m0)\n\n(x(m0)\n\n(x(m0)\n\n(x(m0)\n\n(x(m0)\n\n)(m0)\n\n)(m0)\n\n)(m0)\n\n)(m0)\n\n|x(m0)\n\n|x(m0)\n\n(x(m0)\n\nt1 )P (y(m0)\n\nk,t1=Px\nk,t1=Px\nIn the above equations, we consider the mean \ufb01eld effect by summing over the current and previous\nstates of all the other individuals m0 6= m. The marginalized transition kernel considers the probability\nof event k on the individual m given the context of the temporal evolutions of the other individuals.\nComparing Eqs. (9) and (10) with Eqs. (4) and (5), instead of multiplying g(m0)\nt1 ) for individual\nm0 6= m, we use the expected value of g(m0)\nwith respect to the marginal probability distribution of\nx(m0)\nt1 .\nComplexity Analysis: In our inference algorithm, the most computation-intensive step is the\nmarginalization in Eqs. (9)-(10). The complexity is O(M S2), where M is the number of indi-\nviduals and S is the state space of a single individual. The complexity of the entire algorithm is\ntherefore O(M S2T N ), where T is the number of time steps and N is the number of iterations until\nconvergence. As such, the complexity of our algorithm grows only linearly with the number of\nindividuals; it offers excellent scalability when the number of tracked individuals becomes large.\n3.3 Parameter Learning\nIn order to learn the rate constant ck, we maximize the expected log likelihood. In a stochastic kinetic\nmodel, the probability of a sample path is given in Eq. (3). The expected log likelihood over the\nposterior probability conditioned on the observations y1, . . . , yT takes the following form:\n\u02c6\u21e0t(xt1, xt, vt) \u00b7 log(P (xt, vt|xt1)P (yt|xt)).\n\nlog P (x1,...,T , y1,...,T , v1,...,T ) =Xt,xt1,xt,vt\n\n\u02c6\u21e0t (xt1, xt, vt) is the approximate two-slice statistics de\ufb01ned in Eq. (6). Maximizing this expected\nlog likelihood by setting its partial derivative over the rate constants to 0 gives the maximum expected\nlog likelihood estimation of these rate constants.\n\nck =\n\nPt,xt1,xt\nPt,xt1,xt\n\n\u02c6\u21e0t(xt1, xt, vt = k)\n\n\u02c6\u21e0t(xt1, xt, vt = ;)gk(xt1) \u21e1 Pt Pxt1,xt\nPt QmPx(m)\n\nt1\n\n\u02c6\u21e0t(xt1, xt, vt = k)\n(x(m)\n\u02c6(m)\nt1 (x(m)\nt1)\n\nt1)g(m)\n\nk\n\n.\n\n(11)\n\n1The derivations for the optimization problem and its solution are shown in the Supplemental Material.\n\n5\n\n\fAs such, the rate constant for event k is the expected number of times that this event has occurred\ndivided by the total expected number of times this event could have occurred.\nTo summarize, we provide the variational inference algorithm below.\n\nAlgorithm: Variational Inference with a Stochastic Kinetic Model\nGiven the observations y(m)\nfor k = 1, . . . , V .\nLatent state inference. Iterate through the following forward and backward passes until convergence,\nwhere \u02c6P (x(m)\n\nfor t = 1, . . . , T and m = 1, . . . , M, \ufb01nd x(m)\n\n, vt and rate constants ck\n\n, vt|x(m)\n\nt1) is given by Eqs. (9) and (10).\n\nt\n\nt\n\nt\n\n\u2022 Forward pass. For t = 1, . . . , T and m = 1, . . . , M, update \u02c6\u21b5(m)\n, vt|x(m)\n\nt1) \u02c6P (x(m)\n\n\u02c6\u21b5(m)\nt1(x(m)\n\n(x(m)\n\n) \n\n\u02c6\u21b5(m)\nt\n\n1\n\nt\n\nt\n\nt\n\nt1)P (y(m)\n\nt\n\n|x(m)\n\nt\n\n).\n\n(x(m)\n\nt\n\n) according to\n\nZtXx(m)\n\nt1,vt\n\n\u2022 Backward pass. For t = T, . . . , 1 and m = 1, . . . , M, update \u02c6(m)\n\nt1) according to\n\n\u02c6(m)\nt1 (x(m)\n\nt1) \n\n1\n\nZtXx(m)\n\nt\n\n(x(m)\n\nt\n\n) \u02c6P (x(m)\n\nt\n\n\u02c6(m)\nt\n,vt\n\n, vt|x(m)\n\nt1 (x(m)\nt1)P (y(m)\n\nt\n\n|x(m)\n\nt\n\n).\n\nParameter estimation. Iterate through the latent state inference (above) and rate constants estimate\nof ck according to Eq. (11), until convergence.\n\n4 Experiments on Epidemic Applications\n\nIn this section, we evaluate the performance of variational inference with a stochastic kinetic model\n(VISKM) algorithm of epidemic dynamics, with which we predict the transmission of diseases and\nthe health status of each individual based on proximity data collected from sensor networks.\n\n4.1 Epidemic Dynamics\nIn epidemic dynamics, Gt = (M, Et) is a dynamic network, where each node m 2 M is an\nindividual in the network, and Et = {(mi, mj)} is a set of edges in Gt representing that individuals\nmi and mj have interacted at a speci\ufb01c time t. There are two possible hidden states for each\nindividual m at time t, x(m)\n2 {0, 1}, where 0 indicates the susceptible state and 1 the infectious\nstate. y(m)\n2 {0, 1} represents the presence or absence of symptoms for individual m at time t.\nP (y(m)\n|x(m)\n) represents the observation probability. We de\ufb01ne three types of events in epidemic\napplications: (1) A previously infectious individual recovers and becomes susceptible again: I c1! S.\n(2) An infectious individual infects a susceptible individual in the network: S + I c2! 2I. (3) A\nsusceptible individual in the network is infected by an outside infectious individual: S c3! I. Based\non these events, the transition kernel can be de\ufb01ned as follows:\n\nt\n\nt\n\nt\n\nt\n\nP (x(m)\nP (x(m)\n\nt = 1|x(m)\n\nt1 = 1) = c1, P (x(m)\nt1 = 0) = (1 c3)(1 c2)Cm,t, P (x(m)\n\nt1 = 1) = 1 c1,\nt = 1|x(m)\n\nt = 0|x(m)\nt = 0|x(m)\nwhere Cm,t = Pm0:(m0,m)2Et\n\nt1 = 0) = 1 (1 c3)(1 c2)Cm,t,\n\u2318 1) is the number of possible infectious sources for\nindividual m at time t. Intuitively, the probability of a susceptible individual becoming infected is 1\nminus the probability that no infectious individuals (inside or outside the network) infected him. When\nthe probability of infection is very small, we can approximate P (x(m)\nt1 = 0) \u21e1 c3+c2\u00b7Cm,t.\n\nt = 1|x(m)\n\n(x(m0)\n\nt\n\n6\n\n\f4.2 Experimental Results\n\nData Explanation: We employ two data sets of epidemic dynamics. The real data set is collected\nfrom the Social Evolution experiment [5, 6]. This study records \u201ccommon cold\u201d symptoms of 65\nstudents living in a university residence hall from January 2009 to April 2009, tracking their locations\nand proximities using mobile phones. In addition, the students took periodic surveys regarding their\nhealth status and personal interactions. The synthetic data set was collected on the Dartmouth College\ncampus from April 2001 to June 2004, and contains the movement history of 13,888 individuals [16].\nWe synthesized disease transmission along a timeline using the popular susceptible-infectious-\nsusceptible (SIS) epidemiology model [15], then applied the VISKM to calibrate performance. We\nselected this data set because we want to demonstrate that our model works on data with a large\nnumber of people over a long period of time.\nEvaluation Metrics and Baseline Algorithms: We select the receiver operating characteristic\n(ROC) curve as our performance metric because the discrimination thresholds of diseases vary. We\n\ufb01rst compare the accuracy and ef\ufb01ciency of VISKM with Gibbs sampling (Gibbs) and particle\n\ufb01ltering (PF) on the Social Evolution data set [7, 8].2 Both Gibbs sampling and particle \ufb01ltering\niteratively sample the infectious and susceptible latent state sequences and the infection and recovery\nevents conditioned on these state sequences. Gibbs-Prediction-10000 indicates 10,000 iterations of\nGibbs sampling with 1000 burn-in iterations for the prediction task. PF-Smoothing-1000 similarly\nrefers to 1000 iterations of particle \ufb01ltering for the smoothing task. All experiments are performed on\nthe same computer.\nIndividual State Inference: We infer the probabilities of a hidden infectious state for each individual\nat different times under different scenarios. There are three tasks: 1. Prediction: Given an individual\u2019s\npast health and current interaction patterns, we predict the current infectious latent state. Figure 2(a)\ncompares prediction performance among the different approximate inference methods. 2. Smoothing:\nGiven an individual\u2019s interaction patterns and past health with missing periods, we infer the infectious\nlatent states during these missing periods. Figure 2(b) compares the performance of the three\ninference methods. 3. Expansion: Given the health records of a portion (\u21e0 10%) of the population,\nwe estimate the individual infectious states of the entire population before medically inspecting\nthem. For example, given either a group of volunteers willing to report their symptoms or the\nsymptom data of patients who came to hospitals, we determine the probabilities that the people near\nthese individuals also became or will become infected. This information helps the government or\naid agencies to ef\ufb01ciently distribute limited medical resources to those most in need. Figure 2(c)\ncompares the performance of the different methods. From the above three graphs, we can see that all\nthree methods identify the infectious states in an accurate way. However, VISKM outperforms Gibbs\nsampling and particle \ufb01ltering in terms of area under the ROC curve for all three tasks. VISKM has\nan advantage in the smoothing task because the backward pass helps to infer the missing states using\nsubsequent observations. In addition, the performance of Gibbs and PF improves as the number of\nsamples/particles increases.\nFigure 2(d) shows the performance of the three tasks on the Dartmouth data set. We do not apply\nthe same comparison because it takes too much time for sampling. From the graph, we can see that\nVISKM infers most of the infectious moments of individuals in an accurate way for a large social\nsystem. In addition, the smoothing results are slightly better than the prediction results because we\ncan leverage observations from both directions. The expansion case is relatively poor, because we\nuse only very limited information to derive the results; however, even in this case the ROC curve has\ngood discriminating power to differentiate between infectious and susceptible individuals.\nCollective Statistics Inference: After determining the individual results, we aggregate them to\napproximate the total number of infected individuals in the social system as time evolves. This offers\na collective statistical summary of the spread of disease in one area as in traditional research, which\ntypically scales the sample statistics with respect to the sample ratio. Figures 2(e) and (f) show\nthat given 20% of the Social Evolution data and 10% of the Dartmouth data, VISKM estimates the\ncollective statistics better than the other methods.\nEf\ufb01ciency and Scalability: Table 1 shows the running time of different algorithms for the Social\nEvolution data on the same computer. From the table, we can see that Gibbs sampling runs slightly\nlonger than PF, but they are in the same scale. However, VISKM requires much less computation time.\n\n2Code and data are available at http://cse.buffalo.edu/~wendong/.\n\n7\n\n\f \n\ne\n\nt\n\n \n\na\nR\ne\nv\ni\nt\ni\ns\no\nP\ne\nu\nr\nT\n\n \n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n1\n\n0\n\n \n0\n\n0.1\n\n \n\ne\n\nt\n\n \n\na\nR\ne\nv\ni\nt\ni\ns\no\nP\ne\nu\nr\nT\n\n \n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n1\n\n0\n\n \n0\n\n0.1\n\nVISKM\u2212Smoothing\nPF\u2212Smoothing\u221210000\nPF\u2212Smoothing\u22121000\nGibbs\u2212Smoothing\u221210000\nGibbs\u2212Smoothing\u22121000\n0.6\n\n0.8\n\n0.9\n\n0.2\n\n0.5\n\n0.4\n\n0.3\n\n0.7\nFalse Positive Rate\n(b) Smoothing\n\nVISKM\u2212Expansion\nPF\u2212Expansion\u221210000\nPF\u2212Expansion\u22121000\nGibbs\u2212Expansion\u221210000\nGibbs\u2212Expansion\u22121000\n0.6\n\n0.8\n\n0.9\n\n0.2\n\n0.5\n\n0.4\n\n0.3\n\n0.7\nFalse Positive Rate\n(c) Expansion\n\nVISKM\u2212Prediction\nPF\u2212Prediction\u221210000\nPF\u2212Prediction\u22121000\nGibbs\u2212Prediction\u221210000\nGibbs\u2212Prediction\u22121000\n0.6\n\n0.8\n\n0.9\n\n0.2\n\n0.5\n\n0.4\n\n0.3\n\n0.7\nFalse Positive Rate\n(a) Prediction\n\n \n\n1\n\n \n\n \n\n45\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\ns\nt\n\nn\ne\n\ni\nt\n\na\nP\n\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nN\n\n1\n\n0\n\n \n0\n\n20\n\n40\n\n \n\nReal Number\nVISKM\u2212Aggregation\nPF\u221210000\nGibbs\u221210000\nScaling\n\n150\n\n100\n\n50\n\ns\nt\nn\ne\n\ni\nt\n\na\nP\n\n \nf\n\no\n \nr\ne\nb\nm\nu\nN\n\nReal Number\nVISKM\u2212Aggregation\nScaling\n\ne\n\nt\n\n \n\na\nR\ne\nv\ni\nt\ni\ns\no\nP\ne\nu\nr\nT\n\n \n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n0\n\n0.1\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\ne\nt\na\nR\n \ne\nv\ni\nt\ni\ns\no\nP\n \ne\nu\nr\nT\n\n0\n\n \n0\n\n0.1\n\n0.2\n\nVISKM\u2212Prediction\nVISKM\u2212Smoothing\nVISKM\u2212Expansion\n0.7\nFalse Positive Rate\n(d) Dartmouth\n\n0.8\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.9\n\n60\n\n80\n\nTime Sequence\n\n100\n\n120\n\n140\n\n0\n\n \n0\n\n500\n\n1000\n\n1500\n\nTime Sequence\n\n2000\n\n2500\n\n3000\n\n(e) Social Evolution Statistics\n\n(f) Dartmouth Statistics\n\nFigure 2: Experimental results. (a-c) show the prediction, smoothing, and expansion performance\ncomparisons for Social Evolution data, while (d) shows performance of the three tasks for Dartmouth\ndata. (e-f) represent the statistical inferences for both data sets.\n\nTable 1: Running time for different approximate inference algorithms. Gibbs_10000 refers to Gibbs\nsampling for 10,000 iterations, and PF_1000 to particle \ufb01ltering for 1000 iterations. Other entries\nfollow the same pattern. All times are measured in seconds.\n\nVISKM Gibbs_1000 Gibbs_10000\n\nPF_1000\n\nPF_10000\n\n60 People\n30 People\n15 People\n\n0.78\n0.39\n0.19\n\n771\n255\n101\n\n7820\n2556\n1003\n\n601\n166\n122\n\n6100\n1888\n1435\n\nIn addition, the computation time of VISKM grows linearly with the number of individuals, which\nvalidates the complexity analysis in Section 3.2. Thus, it offers excellent scalability for large social\nsystems. In comparison, Gibbs sampling and PF grow super linearly with the number of individuals,\nand roughly linearly with the number of samples.\nSummary: Our proposed VISKM achieves higher accuracy in terms of area under ROC curve\nand collective statistics than Gibbs sampling or particle \ufb01ltering (within 10,000 iterations). More\nimportantly, VISKM is more ef\ufb01cient than sampling with much less computation time. Additionally,\nthe computation time of VISKM grows linearly with the number of individuals, demonstrating its\nexcellent scalability for large social systems.\n\n5 Conclusions\n\nIn this paper, we leverage sensor network and social network data to capture temporal evolution in\nsocial dynamics and infer individual behaviors. In order to de\ufb01ne the adaptive transition kernel, we\nintroduce a stochastic dynamic mode that captures the dynamics of complex interactions. In addition,\nin order to make tractable inferences we propose a variational inference algorithm the computation\ncomplexity of which grows linearly with the number of individuals. Large-scale experiments on\nepidemic dynamics demonstrate that our method effectively captures the evolution of social dynamics\nand accurately infers individual behaviors. More accurate collective effects can be also derived\nthrough the aggregated results. Potential applications for our algorithm include the dynamics of\nemotion, opinion, rumor, collaboration, and friendship.\n\n8\n\n\fReferences\n[1] Adam Arkin, John Ross, and Harley H McAdams. Stochastic kinetic analysis of developmental\npathway bifurcation in phage -infected escherichia coli cells. Genetics, 149(4):1633\u20131648,\n1998. 1\n\n[2] Matthew Brand, Nuria Oliver, and Alex Pentland. Coupled hidden markov models for complex\n\naction recognition. In Proc. of CVPR, pages 994\u2013999, 1997. 1\n\n[3] Claudio Castellano, Santo Fortunato, and Vittorio Loreto. Statistical physics of social dynamics.\n\nReviews of modern physics, 81(2):591, 2009. 1\n\n[4] Ido Cohn, Tal El-Hay, Nir Friedman, and Raz Kupferman. Mean \ufb01eld variational approximation\nfor continuous-time bayesian networks. The Journal of Machine Learning Research, 11:2745\u2013\n2783, 2010. 1\n\n[5] Wen Dong, Katherine Heller, and Alex Sandy Pentland. Modeling infection with multi-agent\ndynamics. In International Conference on Social Computing, Behavioral-Cultural Modeling,\nand Prediction, pages 172\u2013179. Springer, 2012. 4.2\n\n[6] Wen Dong, Bruno Lepri, and Alex Sandy Pentland. Modeling the co-evolution of behaviors\nand social relationships using mobile phone data. In Proc. of the 10th International Conference\non Mobile and Ubiquitous Multimedia, pages 134\u2013143. ACM, 2011. 4.2\n\n[7] Wen Dong, Alex Pentland, and Katherine A Heller. Graph-coupled hmms for modeling the\n\nspread of infection. In Proc. of UAI, pages 227\u2013236, 2012. 4.2\n\n[8] Arnaud Doucet and Adam M Johansen. A tutorial on particle \ufb01ltering and smoothing: Fifteen\n\nyears later. Handbook of Nonlinear Filtering, 12(656-704):3, 2009. 4.2\n\n[9] Steven N Durlauf and H Peyton Young. Social dynamics, volume 4. MIT Press, 2004. 1\n[10] Stephen Eubank, Hasan Guclu, VS Anil Kumar, Madhav V Marathe, Aravind Srinivasan, Zoltan\nToroczkai, and Nan Wang. Modelling disease outbreaks in realistic urban social networks.\nNature, 429(6988):180\u2013184, 2004. 1\n\n[11] Daniel T Gillespie. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem.,\n\n58:35\u201355, 2007. 1\n\n[12] Andrew Golightly and Darren J Wilkinson. Bayesian parameter inference for stochastic\nbiochemical network models using particle markov chain monte carlo. Interface focus, 2011. 1\n[13] Creighton Heaukulani and Zoubin Ghahramani. Dynamic probabilistic models for latent feature\n\npropagation in social networks. In Proc. of ICML, pages 275\u2013283, 2013. 1\n\n[14] Tom Heskes and Onno Zoeter. Expectation propagation for approximate inference in dynamic\n\nbayesian networks. In Proc. of UAI, pages 216\u2013223, 2002. 1, 3.2\n\n[15] Matt J Keeling and Pejman Rohani. Modeling infectious diseases in humans and animals.\n\nPrinceton University Press, 2008. 4.2\n\n[16] David Kotz, Tristan Henderson, Ilya Abyzov, and Jihwang Yeo. CRAWDAD data set dart-\nmouth/campus (v. 2007-02-08). Downloaded from http://crawdad.org/dartmouth/campus/, 2007.\n4.2\n\n[17] Kevin Murphy and Stuart Russell. Rao-blackwellised particle \ufb01ltering for dynamic bayesian\nnetworks. In Sequential Monte Carlo methods in practice, pages 499\u2013515. Springer, 2001. 1\n\n[18] Uri Nodelman, Christian R Shelton, and Daphne Koller. Continuous time bayesian networks.\n\nIn Proc. of UAI, pages 378\u2013387. Morgan Kaufmann Publishers Inc., 2002. 1, 2.2\n\n[19] Manfred Opper and Guido Sanguinetti. Variational inference for markov jump processes. In\n\nProc. of NIPS, pages 1105\u20131112, 2008. 1\n\n[20] V. Rao and Y. W. Teh. Fast MCMC sampling for markov jump processes and continuous time\n\nbayesian networks. In Proc. of UAI, 2011. 1\n\n[21] Joshua W Robinson and Alexander J Hartemink. Learning non-stationary dynamic bayesian\n\nnetworks. The Journal of Machine Learning Research, 11:3647\u20133680, 2010. 1\n\n[22] Darren J Wilkinson. Stochastic modeling for systems biology. CRC press, 2011. 1, 2.2, 2.2\n[23] Jonathan S Yedidia, William T Freeman, and Yair Weiss. Understanding belief propagation and\nits generalizations. Exploring arti\ufb01cial intelligence in the new millennium, 8:236\u2013239, 2003.\n3.2\n\n9\n\n\f", "award": [], "sourceid": 1410, "authors": [{"given_name": "Zhen", "family_name": "Xu", "institution": "SUNY at Buffalo"}, {"given_name": "Wen", "family_name": "Dong", "institution": "University at Buffalo"}, {"given_name": "Sargur", "family_name": "Srihari", "institution": "University at Buffalo"}]}