{"title": "Mutually Regressive Point Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 5115, "page_last": 5126, "abstract": "Many real-world data represent sequences of interdependent events unfolding over\n time. They can be modeled naturally as realizations of a point process. Despite many potential applications, existing point process models are limited in their\nability to capture complex patterns of interaction. Hawkes processes admit many\nefficient inference algorithms, but are limited to mutually excitatory effects. Non-\nlinear Hawkes processes allow for more complex influence patterns, but for their\nestimation it is typically necessary to resort to discrete-time approximations that may yield poor generative models. In this paper, we introduce the first general\nclass of Bayesian point process models extended with a nonlinear component that\nallows both excitatory and inhibitory relationships in continuous time. We derive a fully Bayesian inference algorithm for these processes using Polya-Gamma augmentation and Poisson thinning. We evaluate the proposed model on single\nand multi-neuronal spike train recordings. Results demonstrate that the proposed\nmodel, unlike existing point process models, can generate biologically-plausible\nspike trains, while still achieving competitive predictive likelihoods.", "full_text": "Mutually Regressive Point Processes\n\nI\ufb01geneia Apostolopoulou\n\nMachine Learning Department\nCarnegie Mellon University\niapostol@andrew.cmu.edu\n\nKyle Miller\nAutonLab\n\nCarnegie Mellon University\nmille856@andrew.cmu.edu\n\nScott Linderman\n\nDepartment of Statistics\n\nStanford University\n\nscott.linderman@stanford.edu\n\nArtur Dubrawski\n\nAutonLab\n\nCarnegie Mellon University\n\nawd@cs.cmu.edu\n\nAbstract\n\nMany real-world data represent sequences of interdependent events unfolding over\ntime. They can be modeled naturally as realizations of a point process. Despite\nmany potential applications, existing point process models are limited in their\nability to capture complex patterns of interaction. Hawkes processes admit many\nef\ufb01cient inference algorithms, but are limited to mutually excitatory effects. Non-\nlinear Hawkes processes allow for more complex in\ufb02uence patterns, but for their\nestimation it is typically necessary to resort to discrete-time approximations that\nmay yield poor generative models. In this paper, we introduce the \ufb01rst general\nclass of Bayesian point process models extended with a nonlinear component that\nallows both excitatory and inhibitory relationships in continuous time. We de-\nrive a fully Bayesian inference algorithm for these processes using P\u00b4olya-Gamma\naugmentation and Poisson thinning. We evaluate the proposed model on single\nand multi-neuronal spike train recordings. Results demonstrate that the proposed\nmodel, unlike existing point process models, can generate biologically-plausible\nspike trains, while still achieving competitive predictive likelihoods.\n\n1\n\nIntroduction\n\nMany natural phenomena and practical applications involve asynchronous and irregular events such\nas social media dynamics, neuronal activity, or high frequency \ufb01nancial markets [1, 2, 3, 4, 5, 6, 7, 8].\nModeling correlations between events of various types may reveal informative patterns, help predict\nnext occurrences, or guide interventions to trigger or prevent future events. Point Processes [9] are\nmodels for the distribution of sequences of events.\nCox processes or doubly stochastic processes [10] are generalizations of Poisson Processes [11],\nwhere the intensity function is a stochastic process itself. Although there are ef\ufb01cient inference\nalgorithms for some of their variants [12, 13], Cox processes do not capture explicitly temporal\ncorrelations between historical and future events. On the other hand, the Hawkes Process (HP)\n[14, 15] and its variants [16, 17, 18] constitute a class of point process models where past events\nlinearly combine to increase the probability of future events. However, purely excitatory effects are\nincapable of characterizing physiological patterns such as neuronal activity where inhibitory effects\nare present and crucial for self-regulation [19, 20, 21]. The work in [22] can support temporal effects\nbeyond mutual excitation that HP misses. However, capturing model uncertainty is critical in many\napplications [23, 24, 25, 26], especially when the size of the available data is limited compared to\nthe model complexity. Rich literature exists on HP-based learning tasks [27, 28, 29, 30, 31, 32].\nA nonlinear generalization of the HP allows for both excitatory and inhibitory interactions, but\nevaluating the probability density of these models requires computing integrated intensity, which is\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fgenerally intractable. Instead, we are forced to use discrete time approximations, which reduce to\na Poisson Generalized Linear Model (Poisson-GLM) [33, 3], making learning of these models from\ndata very ef\ufb01cient. However, the estimated regression coef\ufb01cients may vary widely depending on the\nboundaries chosen for aggregation [34]. Empirical evidence suggests that while suitable for one-step\npredictions, such models may suffer stochastic instability and yield non-physical predictions [35].\nThere is currently limited statistical theory for point process models that support complex temporal\ninteractions in a continuous-time regime. To this end, we develop the \ufb01rst class of Bayesian point\nprocess models\u2014Mutually Regressive Point Processes (MR-PP)\u2014that allow for nonlinear temporal\ninteractions while still admitting an ef\ufb01cient, fully-Bayesian inference algorithm in continuous time.\n\n2 Proposed Model\n\n2.1 Problem statement\n\nWe are interested in learning distributions over event sequences (point processes). These distri-\nbutions are mutually regressive in the sense that past event occurrences can in\ufb02uence the future\nrealization of the process in an arbitrary manner. A Point Process PP(\u03bb(t)) is characterized by an\nintensity function \u03bb(t), so that in an in\ufb01nitesimally wide interval [t, t + dt], the probability of the\narrival of a new event is \u03bb(t)dt [36].\n\n2.2 Classical Hawkes Process\nA Hawkes process (HP) [14, 15] of N event types HP N (\u03bb\u2217\nfunctions \u03bb\u2217\n\nn(t) for the events of type n de\ufb01ned as:\n\nN(cid:88)\n\nKm(cid:88)\n\n\u03bb\u2217\nn(t) = \u03bb\u2217\n\n\u03bbm,n(t, tm\nn +\ni ) = \u03b1m,n e\u2212\u03b4m,n(t\u2212tm\ni ),\n\nm=1\n\ni=1\n\ni )I(tm\n\ni < t),\n\nn(t)) is characterized by the intensity\n\n(1)\n\n(4)\n\n(5)\n\n(6)\n\ni\n\n\u03bbm,n(t, tm\n\nn \u2265 0, \u03b1m,n \u2265 0, and \u03b4m,n > 0. tm\n\n(2)\nwhere \u03bb\u2217\nis the arrival time of the i-th event of type m and Km is\nthe number of events of type m. I is the indicator function. By the superposition theorem for Poisson\nprocesses, the additive terms in Equation (1) can be viewed as the superposition of independent\nnon-homogeneous Poisson processes (with intensity function that varies in time) characterized by\nthe intensity functions \u03bbm,n(t, tm\ni ), triggered by the event i-th of type m that occurred before time\nt, and an exogenous, homogeneous Poisson process characterized by the constant intensity function\n\u03bb\u2217\nn. The HP is a mutually exciting point process in the sense that past events can only raise the\nprobability of arrival of future events of the same or different type. Since \u03bb\u2217\nn(t) depends on past\noccurrences, it is a stochastic process itself.\n\n2.3 Mutually Regressive Point Process: a generalization of the Hawkes Process\n\nThe intensity function \u03bbn(t), for events of type n occurring at times \u02d9tn\ni , of a Mutually Regressive\nPoint Process (MR-PP) is a HP intensity augmented with a probability term. It is de\ufb01ned as follows:\n(3)\n\nn(t)pn(t),\n\nN(cid:88)\n\nKm(cid:88)\n\ni=1\n\nm=1\nn h(t)),\n\n\u03bbn(t) = \u03bb\u2217\nn(t) = \u03bb\u2217\n\u03bb\u2217\nKm(cid:88)\n\npn(t) = \u03c3(wT\n\nn +\n\nhm(t) = c\n\nh(t, \u02d9tm\n\ni )I( \u02d9ti\n\nm < t),\n\n\u03bbm,n(t, \u02d9tm\n\ni )I( \u02d9tm\n\ni < t),\n\ni=1\n\ni ) = e\u2212\u03b3(t\u2212 \u02d9tm\ni ),\n\nh(t, \u02d9tm\n\n(7)\nn \u2265 0, c > 0, \u03b3 > 0, wn = [bn, w1,n, w2,n, . . . , wN,n]T , h(t) =\nwhere \u03bb\u2217\ni ) de\ufb01ned in Equation (2). \u03c3(x) = (1 + e\u2212x)\u22121 is the\n[1, h1(t), h2(t), . . . , hN (t)]T and \u03bbm,n(t, \u02d9tm\nsigmoid function. The weight wm,n models the in\ufb02uence of type m on type n and hm(t) is the\n\n2\n\n\f(a) Computational \ufb02ow of the MR-PP intensity function described in Equations 3-7\n\n(b) Poisson Thinning\n\n(c) HP vs MR-PP intensity function\n\n1 , tn\n\n2 , . . . , for n = 1, 2, . . . , N, from a HP N (\u03bb\u2217\n\nFigure 1: Explanation of the MR-PP. The computation of the intensity function of a MR-PP at time t as a\nfunction of the past events is explained in Figure 1a. Figure 1b shows the simulation of a MR-PP which can\nbe viewed as classi\ufb01cation of events generated by a HP as either latent or observed. The point processes of\nthe observed and thinned events are characterized by the \u03bb\u2217(t) intensity multiplied by the probability term\np(t) and 1 \u2212 p(t) respectively. The upper-bounding, mutually exciting, intensity and the thinned intensity\n\u03bb(t) = \u03bb\u2217(t) \u00d7 p(t) which generates the observed events is shown in 1c.\naggregated temporal in\ufb02uence of type m up to time t. The computational procedure is illustrated in\nFigure 1a. The effect of the probability term on the upper-bounding intensity \u03bb\u2217\nn(t) is demonstrated\nin Figure 1c. We can simulate from this model via Poisson thinning [37, 12]. First, we sample N\nn(t)). Afterwards, we chronologically\nsets of events tn\nproceed through the simulated events and accept them with probability \u03bbn(t)/\u03bb\u2217\nn(t) = pn(t), the\nrelative intensity at that point in time (Figure 1b). In case an event at tn\nis rejected, its offsprings\ni )) are pruned so that the \u03bb\u2217\ni\nn(t) de\ufb01ned in Equation (4) depends only\n(events generated by \u03bbn,m(t, tn\non the realized events whose arrival times are notated as \u02d9tm\ni . Importantly, the relative intensity pn(t)\nand the intensity \u03bb\u2217\nn(t) only depend on the preceding events that were accepted; rejected events have\nno in\ufb02uence on the future intensity. Note that a negative weight wm,n means that events of type m\ninhibit future events of type n since hm(t) decreases pn(t). The correctness of this procedure is\nprovided in the Supplementary Material.\nn(t) could be replaced by a homogeneous Poisson intensity \u03bb\u2217\nAlthough \u03bb\u2217\nn so that any excitatory\nrelationships are captured by a positive weight wm,n, the upper bound \u03bb\u2217\nn should be given a very\nlarge value in cases where the underlying process exhibits sparse event bursts. This fact, in turn,\ncould yield a large number of latent events and hence render the learning of the model computa-\ntionally intractable (see Section 3.1 for details). Moreover, MR-PP is not hardwired to exponential\nkernels. Alternative kernel functions, such as the Power-Law or the Rayleigh function could be used\nin Equations (2) and (7).\n\n2.4 Hierarchical MR-PP for relational constraints\nA dependence between the parameters of the intensity \u03bb\u2217\nn(t) and the thinning procedure pn(t) can be\nimposed so that an interaction between types m and n is either inhibitory or excitatory (but not both)\nin a probabilistic manner. To this end, we de\ufb01ne a Sparse Normal-Gamma prior for the weights,\n\n3\n\n\f\u03c4m,n \u223c Gamma(\u03bd\u03c4 \u03c6\u03c4 (\u03b1m,n) + \u03b1\u03c4 , \u03b2\u03c4 ),\n\u00b5m,n \u223c N (\u2212(\u03bd\u00b5\u03c6\u00b5(\u03b1m,n) + \u03b1\u00b5)\u22121, (\u03bb\u00b5\u03c4m,n)\u22121),\n\nwhich fosters an inverse relationship between the excitatory effect \u03b1m,n and the repulsive effect\nwm,n of type m on type n. It is motivated by the framework of Sparse Bayesian Learning [38, 39],\nin the sense that it associates an individual precision \u03c4m,n and mean \u00b5m,n with each weight wm,n.\n\u00b5m,n and \u03c4m,n follow a Normal-Gamma distribution that depends on \u03b1m,n. It is de\ufb01ned as follows:\n(8)\n(9)\n(10)\nwhere \u03bd\u03c4 > 0, \u03b1\u03c4 > 0, \u03b2\u03c4 > 0, \u03bd\u00b5 > 0, \u03b1\u00b5 \u2265 0, \u03bb\u00b5 > 0, \u00b5n = [\u00b50, \u00b51,n, \u00b52,n, . . . , \u00b5N,n]T ,\n2, \u03c41,n, \u03c42,n, . . . , \u03c4N,n]T , and \u03a3n = diag(\u03c4 n)\u22121. \u03c6\u03c4 (x) and \u03c6\u00b5(x) are monotonically\n\u03c4 n = [1/\u03c30\nincreasing positive activation functions.\nA suggested activation function for \u03c4m,n and \u00b5m,n is a shifted and scaled sigmoid function which\nhas a soft-thresholding effect:\n\nwn \u223c N(cid:0)\u00b5n, \u03a3n\n\n(cid:1),\n\n\u03c6(\u03b1m,n) =\n\n1\n\n1 + e\u2212\u03b40(\u03b1m,n\u2212\u03b10)\n\n.\n\n(11)\n\n\u03b10 > 0 can be viewed as the excitation threshold (so that values of \u03b1m,n above \u03b10 indicate an\nexcitatory relationship) and \u03b40 > 0 regulates the smoothness of the thresholding.\nNote that when \u03b1m,n is large (there is excitatory relationship from type m on type n), the precision\n\u03c4m,n becomes large (approximately drawn from Gamma(\u03bd\u03c4 , \u03b2\u03c4 )) assuming \u03bd\u03c4 >> \u03b1\u03c4 and \u03bd\u03c4 >>\n\u03b2\u03c4 . Therefore, the variance \u03c4\u22121\nm,n has a value close to zero with high probability. A similar scenario\nholds for \u00b5m,n if \u03bd\u00b5 >> \u03b1\u00b5. A small mean and variance for wm,n implies that any additional\n(possibly inhibitory) effect of type m on type n is suppressed. A numerical example is given in\nFigure 2a. On the other hand, when \u03b1m,n is small, the precision of \u03c4m,n will take a small value\napproximately drawn from Gamma(\u03b1\u03c4 , \u03b2\u03c4 ) (assuming that \u03bd\u03c4 \u03c6\u03c4 (\u03b1m,n) << \u03b1\u03c4 and \u03b1\u03c4 < \u03b2\u03c4 ).\nSimilarly, \u00b5m,n can take large negative values coming approximately from a Normal distribution\nwith mean \u2212\u03b1\u22121\n\u00b5 . As a consequence, inhibitory effects from type m on type n are enabled. A\nnumerical example is given in Figure 2b.\nDue to the inverse relationship between the inhibitory coef\ufb01cients wm,n and the endogenous inten-\nsity rates \u03b1m,n, relational constraints on pairs of types are established. Intuitively, the constants \u03bd\u03c4 ,\n\u03bd\u00b5 control the strength of these constraints, so that wm,n is close to zero for a large \u03b1m,n with an\nadjustable probability. A traditional Hawkes process can be obtained by setting \u03bd\u03c4 , \u03bd\u00b5 \u03bb\u00b5, \u03b1\u03c4 , \u03b1\u00b5,\n\u00b50 and \u03c40 to a very large value.\n\n(a) Hierarchical prior for an excitatory relationship\n\n(b) Hierarchical prior for an inhibitory relationship\n\nFigure 2: Illustration of the behavior of the hierarchical prior for enforcing relational constraints. In 2a the\nexcitatory coef\ufb01cient \u03b1 is above the threshold value (0.05) indicating an excitatotry relationship. The prior\ndrives the weights to a value close to zero. In 2b the coef\ufb01cient is below the threshold indicating an inhibitory\nrelationship. The prior steers the weights to a large negative value. The parameters of the hierarchical prior\nwere set as follows: \u03bd\u03c4 = 100, \u03b1\u03c4 = 0.01, \u03b2\u03c4 = 1, \u03b1\u00b5 = 0.001, \u03bd\u00b5 = 100, \u03bb\u00b5 = 100.\n\n4\n\n\f3 Bayesian Inference via Augmentation and Poisson Thinning\n\nHere, we provide the description of the main components of the Bayesian inference for learning a\nMR-PP. It is also summarized in Algorithm 1. Full technical details are relegated to the Supplemen-\ntary Material.\n\n3.1 Generating latent events for tractability\nThe likelihood of the sequence T (cid:44) {ti}K\nwith intensity function \u03bb(t) in the time window [0, T ] is [36]:\n\ni=1 of K events generated by a point process PP(\u03bb(t))\n(cid:40)\n\n(cid:90) T\n\n(cid:41) K(cid:89)\n\n\u03bb(ti).\n\n(12)\n\n\u03bb(t) dt\n\np(T | \u03bb(t)) = exp\n\n\u2212\n\n0\n\ni=1\n\nHowever, due to the sigmoid term in the intensity function described in Equations (3), (5), the inte-\ngral and therefore sampling from posteriors which contain it, is intractable [12, 13]. This dif\ufb01culty\ncan be overcome by data augmentation [12], in which we jointly consider observed and thinned\nevents akin to the Poisson thinning based sampling procedure mentioned in Section 2.3.\nLet \u02dcTn (cid:44) {\u02dctn\ni=1 be\nthe Kn observed events generated by thinning the process PP(\u03bb\u2217\nn(t)) de\ufb01ned in Equation (4) by\nthe probability 1 \u2212 pn(t) and pn(t) respectively, where pn(t) is de\ufb01ned in Equation (5). De\ufb01ne the\nmerged event sequence to be the ordered set:\n\ni=1 be the sequence of Mn latent (thinned) events of type n and \u02d9Tn (cid:44) { \u02d9tn\n\ni }Mn\n\ni }Kn\n\nTn (cid:44) \u02d9Tn \u222a \u02dcTn = {tn\n\ni }Kn+Mn\n\ni=1\n\n.\n\nThe joint likelihood of the arrival times along with the outcome of the Poisson thinning is then:\n\np(Tn,{sn\n\n(cid:40)\n(cid:90) T\ni }Kn+Mn\n\ni=1\n\n\u2212\n\n0\n\nexp\n\n(cid:41)\n\n| \u03bb\u2217\n\nn(t), pn(t)) =\n\n\u00d7 Kn+Mn(cid:89)\n\ni ) \u00d7 Mn+Kn(cid:89)\n\n\u03bb\u2217\nn(tn\n\npn(tn\n\ni )sn\n\ni (1 \u2212 pn(tn\n\ni ))1\u2212sn\ni ,\n\n\u03bb\u2217\nn(t) dt\n\ni=1\n\ni=1\n\n(13)\n\n(14)\n\n(cid:44) I(tn\n\ni \u2208 \u02d9Tn)\u2208 {0, 1} is the label indicating whether the event at tn\n\ni is realized (belongs\nwhere sn\ni\nto \u02d9Tn) or thinned (belongs to \u02dcTn). Given Equation (14), the integral in the exponential term does not\ninvolve the sigmoidal term induced by pn(t). Therefore, ef\ufb01cient inference for the model parameters\nis feasible and it is reduced to the joint task of learning a Bayesian HP [40] and solving a Bayesian\nbinary logistic regression (see Section 3.2).\n\n3.2 Learning the nonlinear temporal interactions via P\u00b4olya-Gamma augmentation\n\nThe inference of the weights wm,n of the thinning procedure dictated by pn(t) amounts to solving\na binary logistic regression problem for classifying the events as realized or thinned. From Equa-\ntions (5), (10) and (14), and by keeping only the terms of the likelihood which contain wn, the\nposterior is obtained:\n\np(wn | . . . ) \u221d N (wn; \u00b5n, \u03a3n) \u00d7 Kn+Mn(cid:89)\n\ne(wn\newT\n\nT h(tn\n\ni\n\ni ))\u00d7sn\ni ) + 1\n\nn h(tn\n\n,\n\n(15)\n\nwhere we have used the property 1 \u2212 \u03c3(x) = \u03c3(\u2212x). Sampling from this posterior can be done\neffeciently via P\u00b4olya-Gamma augmentation as in [e.g. 41, 42, 43, 13]. According to Theorem 1\nin [41], the likelihood contribution of the thinning acceptance/ rejection of an event at time tn\ni can\nbe rewritten as:\n\ne(wn\newT\n\nT h(tn\n\ni\n\ni ))\u00d7sn\ni ) + 1\n\n\u221d exp(\u03bdn\n\nn h(tn\n\ni wn\n\nT h(tn\n\ni ; 1, 0) d\u03c9n\ni ,\n(16)\ni ; 1, 0) is the density of a P\u00b4olya-Gamma distribution with pa-\nwhere \u03bdn\nrameters (1, 0). Combined with a prior on wn, the integrand in Equation (16) de\ufb01nes a joint density\ni is a latent P\u00b4olya-Gamma random variable. The posterior conditioned on\non (sn\n\ni \u2212 1/2, and PGm(\u03c9n\n\ni , wn), where \u03c9n\n\ni = sn\n\n\u03c9n\ni (wT\n\nn h(tn\n\ni ))2\n\ni , \u03c9n\n\nexp\n\n0\n\n(cid:27)\n\nPGm(\u03c9n\n\n(cid:90) \u221e\n\ni )) \u00d7\n\ni=1\n\n(cid:26)\n\n\u2212 1\n2\n\n5\n\n\fAlgorithm 1 Bayesian Inference for Mutually Regressive Point Processes\n\n1. Input: Sequences of observed events { \u02d9Tn}N\n\n2. Output: Samples from p(cid:0)c, \u03b3,{\u03bb\u2217\n\nn, wn,{\u03b1m,n, \u03b4m,n}N\n3. Initialize randomly the model parameters from the priors.\n4. Repeat\n\nn=1.\n\nm=1}N\n\nn=1 | { \u02d9Tn}N\n\nn=1\n\n(cid:1).\n\n(a) Sample the thinned events of type n via Poisson thinning, for n = 1, 2, . . . , N:\n\nn (1 \u2212 pn(t))), and\ni. from the exogenous intensity: \u02dcTn \u223c PP(\u03bb\u2217\n(cid:9)N\nii. from the Poisson processes triggered by the observed events:\n\ni ) (1 \u2212 pn(t))(cid:1)}Km\n\n(cid:8){ \u02dcTn \u223c PP(cid:0)\u03bbm,n(t \u2212 \u02d9tm\n(cid:8){\u03c9n\n\n(cid:9)N\n\ni ))}Kn+Mn\n\nn=1 (Eq 21).\n\nn h(tn\n\nm=1.\n\ni=1\n\ni=1\n\n(b) Sample the latent P\u00b4olya-Gamma variables of the observed and latent events:\n\n(c) Jointly sample the weight prior parameters and the excitation coef\ufb01cients\n(d) Sample the weights for n = 1, . . . , N: wn \u223c N ( \u02dc\u03a3n, \u02dc\u00b5n) (Eq 17, 18, 19 & 20).\n(e) Sample the rest of the parameters c, \u03b3,{\u03bb\u2217\n\nm,n=1 via collapsed Metropolis-Hastings.\n\nn,{\u03b4m,n}N\n\nm=1}N\n\nn=1.\n\ni \u223c PGm(1, wT\n{\u03b1m,n, \u00b5m,n, \u03c4m,n}N\n\nthe latent \u03c9n\n\ni random variables becomes:\np(wn | . . . ) = N (wn; \u02dc\u03a3n, \u02dc\u00b5n),\nn + H T\nwhere\n(cid:21)T\nand H n = [h(tn\n1 ), . . . , h(tn\n\u03bdn\nKn+Mn\n\u03c9n\n\n\u02dc\u03a3n =(cid:0)\u03a3\u22121\n(cid:20) \u03bdn\n\nzn =\n\n, . . . ,\n\n.\n\n1\n\u03c9n\n1\n\nKn+Mn\n\nn \u2126nH n\n\nKn+Mn\n\n(cid:1)\u22121\n\n,\n\n\u02dc\u00b5n = \u02dc\u03a3n\n\n(cid:0)\u03a3\u22121\n\n)]T , \u2126n = diag(\u03c9n\n\nn \u00b5n + H T\n1 , . . . , \u03c9n\n\nn , \u2126nzn\n),\n\nKn+Mn\n\nFrom Theorem 1 in [41], for \u03b1 = 1 and \u03b2 = 1, the posterior for sampling \u03c9n\ni ; 1, wT\n\nn(cid:48)=1, wn, c, \u03b3) = PGm(\u03c9n\n\n| . . . ) = p(\u03c9n\n\n| { \u02d9Tn(cid:48)}N\n\np(\u03c9n\ni\n\ni is\nn h(tn\n\ni\n\ni )).\n\n(cid:1),\n\n(17)\n(18)\n(19)\n\n(20)\n\n(21)\n\n3.3 Gibbs updates for the weights\u2019 prior mean and precision, and the intensity parameters\n\nSince only one sample wm,n for sampling the mean \u00b5m,n and the precision \u03c4m,n is available, directly\nsampling from the posterior p(\u00b5m,n, \u03c4m,n|wm,n, \u03b1m,n) would lead to poor mixing. This is also the\ncase for sampling \u03b1m,n from p(\u03b1m,n|\u00b5m,n, \u03c4m,n, . . . ). Therefore, a joint collapsed Metropolis-\nHastings update is used for sampling the excitation coef\ufb01cient \u03b1m,n and the weights\u2019 prior parame-\nters \u00b5m,n and \u03c4m,n, where the weight wm,n is collapsed. This is a similar in spirit to the technique\nin [38], where a collapsed likelihood is maximized. The collapsed Metropolis-Hastings ratio is\nderived in the Supplementary Material.\nGiven the observed and thinned events, conjugate updates are possible for the exogenous intensities\n\u03bb\u2217\nn assuming a Gamma prior, a cluster-based Hawkes process representation [15] and by incorpo-\nrating latent parent variables for the observed events [44, 45]. This is also the case for \u03b1m,n in\ncase of a \ufb02at MR-PP (de\ufb01ned in Section 2.3). The rest of the parameters are updated via adaptive\nMetropolis similar to [40]. The suggested proposal distributions and the Metropolis-Hastings ratios\nare given in the Supplementary Material.\n\n4 Experimental Results 1\n\n4.1 Synthetic validation\n\nWe test our model and inference algorithm on synthetic data to ensure that we can recover the\nunderlying interactions. We generated a MR-PP of two event types with parameters drawn from\ntheir priors (see Supplementary Material for the details) and we simulated it in the interval [0, 20000].\nat https://github.com/i\ufb01aposto/\n\nlibrary is written in C++.\n\navailable\n\n1The\n\ncode\n\nOur\n\nis\n\nMutually-Regressive-Point-Processes\n\n6\n\n\f(a) Excitation from Type I\n\n(e) Excitation from Type I\n\n(b) Inhibition from Type I\n\n(c) Excitation from Type II\n\nEffects on Type I\n\n(f) Inhibition from Type I\n\n(g) Excitation from Type II\n\nEffects on Type II\n\n(d) Inhibition from Type II\n\n(h) Inhibition from Type II\n\nFigure 3: Posterior distributions of the parameters of the synthetic MR-PP. There is self-excitation and mutual\ninhibition for both types. The self-excitation is indicated by the large endogenous intensity rates a1,1(3a) and\na2,2(3g) and the small weights w1,1(3b) and w2,2(3h). The mutual inhibition is indicated by the small a2,1(3c)\nand a1,2(3e) and the large negative w2,1(3d) and w1,2(3f). The correct interactions were discovered.\nThe derived synthetic dataset consists of 269 observed events\nthat were used for the training. Type I excites events of Type\nI and inhibits events of Type II. Similarly, Type II inhibits\nevents of Type I and excites events of Type II.\nIn Figures 3a-3d, we plot the posterior distribution, as well as\nthe posterior mode and mean point estimates for the param-\neters \u03b11,1, w1,1 (temporal effect from Type I on Type I), and\n\u03b12,1, w2,1 (temporal effect from Type II on Type I). Both the\nreal and the point estimates for the excitatory effect \u03b11,1 (Fig-\nure 3a) from Type I are large (above the \u03b10 = 0.015 thresh-\nold) compared to the suppressed, close to zero, weight w1,1\n(Figure 3b) indicating an excitatory relationship relationship.\nOn the other hand, as shown in Figure 3d, the weight w2,1 has\na large negative value in contrast to \u03b12,1 (Figure 3c), which\nhas a close to zero value, indicating a repulsive relationship.\nA symmetric case of self-excitation (Figures 3g, 3h) and in-\nhibition from the other type (Figures 3e, 3f) holds for Type II.\nFigure 4 shows the predictive log-likelihood for 1,000 held-\nout event sequences with the real model parameters in contrast to that achieved by the posterior\nmode estimates, and the mean absolute error (MAE). The autocorrelation plots, the values of the\nhyperparameters and the learning parameters are provided in the Supplementary Material.\n\nFigure 4: Testing of the learned MR-\nPP on the synthetic data.The scatterplot\ncompares the log-likelihood for 1000\nheld-out event sequences of the true vs\nthe learned MR-PP.\n\n4.2 Experimental results on the stability of single neuron spiking dynamics\n\nIn this section, we study the quality of the MR-PP as a generative model. Although Point Process -\nGeneralized Linear Models (PP-GLMs) have been extensively applied to a wide variety of spiking\nneuron data [3, 33, 46], they may yield non-physiological spiking patterns when simulated and used\nas generative models because of explosive \ufb01ring rates although they pass goodness-of-\ufb01t tests [35,\n47]. This could be potentially attributed to the fact that the excitatory properties are captured by non-\nlinear terms in the model [48]. On the other hand, MR-PP inherently circumvents this by decoupling\nthe linear excitatory portion from the non-linear but unit-bounded, inhibitory portion of the model.\nWe repeat the analysis on two datasets (Figure 2.b and Figure 2.c in [35]) for which PP-GLMs have\nfailed in generating stable spiking dynamics.\n\n7\n\n\f(a) Real spike patterns\n\n(e) Real spike patterns\n\n(b) Goodness-of-\ufb01t test\n\n(c) Observed simulated activity\n\nStability analysis of the MR-PP for spike patterns from monkey cortex\n\n(f) Goodness-of-\ufb01t test\n\n(g) Observed simulated activity\n\nStability analysis of the MR-PP for spike patterns from human cortex\n\n(d) Thinned simulated activity\n\n(h) Thinned simulated activity\n\nFigure 5: Stability analysis of the MR-PP for cortex spike patterns. In Figures 5a- 5d, we repeat the analysis of\nFigure 2.c for monkey cortex spike trains, and in Figures 5e- 5h, we repeat the analysis of Figure 2.b for human\ncortex spike train in [35]. In contrast to the PP-GLM, MR-PP both passes the goodness-of-\ufb01t test (5b),(5f) and\ngenerates stable spike trains (5c),(5g) similar to those used for the learning (5a),(5e).\n\nFigure 5a illustrates ten 1-second observations of single-neuron activity from monkey area PMv\ncortical recordings used in [35]. We \ufb01t the MR-PP and we applied the time-rescaling theorem [49,\n50] on the learned intensities and the real spike sequences. According to it, the realization of the\ngeneral temporal point process can be transformed to one of a homogeneous Poisson process with\nunit intensity rate. Therefore, the well-studied Kolmogorov-Smirnov (KS) test can be capitalized\nfor the comparison of the rescaled interspike arrivals to the exponential. Figure 5b shows the KS\nplot as in [49] for comparison of the empirical with the exponential distribution. The MR-PP passes\nthe goodness-of-\ufb01t test (p \u2212 value > 0.05). Finally, we simulated the learned MR-PP for 1 second.\nFigure 5c shows the observed events of the process. The simulated activity of the learned MR-PP\nshown in Figure 5c remains physiological and similar to the one used for the training in Figure 5a.\nFigure 5d shows the rejected (thinned) events of the process (but not their pruned offsprings), whose\nrealization could have potentially yielded explosive rates.\nIt should be noted that the learned MR-PP exhibits a fuzzy behavior: it is both self-excitatory and\nafter some time self-inhibitory capturing a phenomenon of self-regulation [19] in this way. This\nfact could justify the choice of the soft relational constraints induced by the Sparse Normal-Gamma\nprior instead of a hard, Bernoulli-dictated constraint (for capturing a purely excitatory or purely\ninhibitory effect). Figures 5e-5g present a similar analysis for single-neuron activity from human\ncortex [35]. Note that the learned model in Figure 5g was simulated for a longer period (80 seconds)\nthan the observation in Figure 5e (10 seconds). We plot only the last 10 seconds. The full simulated\nspike train for Figure 5g, the learned intensity functions, the values of the hyperparameters and the\nparameters of the learning algorithm are provided in the Supplementary Material.\n\n4.3 Experimental results on multi-neuron spike train data\n\nIn this section, we apply the proposed model to a data set consisting of spike train recordings from\n25 neurons in the cat primary visual cortex (area 17) under spontaneous activity. The data is pub-\nlicly available and can be downloaded from the NSF-funded CRCNS data repository [51]. The\ndataset was acquired with multi-channel silicon electrode arrays that enable simultaneous recording\nfrom many single units at once. This is of utmost importance because recordings from multiple\nneurons at a time are necessary if conclusions about cortical circuit function or network dynamics\nare to be derived. In Figure 6a, we visualize the spike train used in the experiment. We used the\nspikes that are contained in the time-window [0, 13000] msec for learning a MR-PP and those in\n[13000, 26000] msec for testing it. Both the training and the testing spike sequences contain roughly\n\n8\n\n\f(a) Multi-neuronal spontaneous activity in cat visual cortex\n\n(b) MR-PP Learning Curve\n\nFigure 6: Multi-neuronal spike train analysis. 6a visualizes the spike trains for a population of 25 neurons that\nwere used for \ufb01tting and testing the multivariate MR-PP. 6b shows the training log-likelihood of the MR-PP\nwith mode point posterior estimates for an increasing number of MCMC batches of 100 samples. The training\nlog-likelihood reaches this of the \ufb01tted PP-GLM. However, the log-likelihood for the held-out, second half of\nthe spike train in 6b, is larger for the MR-PP and close to the training log-likelihood.\n\n3,000 spikes each. In Figure 6b, we plot the learning curve (the training data log-likelihood of the\nspike stream realized with respect to the total number of Markov Chain Monte Carlo (MCMC) sam-\nples - the 2000 burn-in samples are also included). The predictive log-likelihood (normalized by the\nnumber of spikes) achieved by the posterior mode estimates (from the last 3000 MCMC samples) for\nthe second half of Figure 6a is \u22125.374. We also \ufb01t a Poisson-GLM with log link function assuming\nintensities of the same form as in [33] provided by the statistical python package StatsModels. We\nadjusted the time discretization interval needed to get the spike counts and the order of the regres-\nsion (\u2206t = 0.1 msec and Q = 1, respectively), so that the predictive log-likelihood for the spikes\nin [13000, 26000] is maximized. StatsM odels uses Iteratively Reweighted Least Squares (IRLS)\nfor ef\ufb01ciently \ufb01tting GLMs. No regularization was incorporated in the model. Assuming that \u2206t is\nsmall enough so that there is at most one spike in each one of the B = T /\u2206t time bins in the interval\n[0, T ], the discrete-time log-likelihood of the Jb spike counts in the time bins Tb, for b = 1, 2, . . . , B\nis given by\n\nB(cid:88)\n\nlog(\u03bb(Tb|\u03b8, Hb)\u2206t)Jb \u2212 B(cid:88)\n\nb=1\n\nb=1\n\nlog p(J1:B|\u03b8) =\n\nwhere J = (cid:80)B\n\n\u03bb(Tb|\u03b8, Hb)\u2206t + J log(\u2206t),\n\n(22)\n\nb=1 Jb is the total number of spikes, \u03b8 the Poisson-GLM parameters and Hb the\nspiking count history in the last Q time bins before the b-th time bin. For suf\ufb01ciently small \u2206t, it\ncan be proved [33] that Equation (22) is a discrete time approximation of the continuous time log-\nlikelihood in Equation (12). For fair comparison, in Figure 6b we are subtracting the term Jlog(\u2206t)\nfrom the log-likelihood reported by the StatsModels (Equation (22)). The hyperparameters, the\nlearning parameters, and the inference time are given in the Supplementary Material.\n\n5 Discussion\n\nIn this paper, we have presented the \ufb01rst Bayesian, continuous time, point process model which\ncan capture nonlinear, potentially inhibitory, temporal dependencies. A joint prior for the model\nparameters was designed so that soft relational constraints between types of events are established.\nThe model has managed to recover physiological, single-neuronal dynamics, unlike prevalent alter-\nnatives, while still achieving competitive forecasting capacity for multi-neuronal recordings.\nThere are several avenues for practical utility of the proposed model, such as analyses of physiolog-\nical mechanisms which are abundant of complex temporal interactions between events of various\ntypes and are characterized by relative data scarcity. For example, vital signs monitoring [52, 53],\ndynamic modeling of biological networks [54, 55] or temporal modeling of clinical events [56],\nwhere inhibitory effects may e.g. represent medical therapies or treatments, could be potential ap-\nplication domains of mutually regressive point processes.\nThere is a multitude of learning tasks that can be augmented with the use of \u2018signed\u2019 relationships,\nso that they can leverage both the excitatory and the inhibitory interactions that MR-PP can describe\nsuch as discovering causality [31] or network structure [57, 45]. Finally, prior sensitivity analysis,\na design strategy for hyperparameters selection and development of stochastic variational inference\nalgorithms [44] for large-scale MR-PPs are left for future research.\n\n9\n\n\f6 Acknowledgments\n\nThis work was partially supported by DARPA under award FA8750-17-2-013, in part by the Alexan-\nder Onassis Foundation graduate fellowship and in part by the A. G. Leventis Foundation graduate\nfellowship. We would also like to thank Sibi Venkatesan and Jeremy Cohen for their useful feedback\non the paper and Alex Reinhart for helpful discussions.\n\nReferences\n[1] Shuang-Hong Yang and Hongyuan Zha. Mixture of mutually exciting processes for viral diffusion. In\n\nInternational Conference on Machine Learning, pages 1\u20139, 2013.\n\n[2] Don H Johnson. Point process models of single-neuron discharges. Journal of computational neuro-\n\nscience, 3(4):275\u2013299, 1996.\n\n[3] Jonathan W Pillow, Jonathon Shlens, Liam Paninski, Alexander Sher, Alan M Litke, EJ Chichilnisky, and\nEero P Simoncelli. Spatio-temporal correlations and visual signalling in a complete neuronal population.\nNature, 454(7207):995, 2008.\n\n[4] Alan G Hawkes. Hawkes processes and their applications to \ufb01nance: a review. Quantitative Finance,\n\n18(2):193\u2013198, 2018.\n\n[5] Emmanuel Bacry, Iacopo Mastromatteo, and Jean-Franc\u00b8ois Muzy. Hawkes processes in \ufb01nance. Market\n\nMicrostructure and Liquidity, 1(01):1550005, 2015.\n\n[6] Ahmed M Alaa, Scott Hu, and Mihaela van der Schaar. Learning from clinical judgments: Semi-markov-\nmodulated marked hawkes processes for risk prognosis. In Proceedings of the 34th International Confer-\nence on Machine Learning-Volume 70, pages 60\u201369. JMLR. org, 2017.\n\n[7] George O Mohler, Martin B Short, P Jeffrey Brantingham, Frederic Paik Schoenberg, and George E\nTita. Self-exciting point process modeling of crime. Journal of the American Statistical Association,\n106(493):100\u2013108, 2011.\n\n[8] Alex Reinhart et al. A review of self-exciting spatio-temporal point processes and their applications.\n\nStatistical Science, 33(3):299\u2013318, 2018.\n\n[9] Daryl J Daley and David Vere-Jones. An introduction to the theory of point processes: volume II: general\n\ntheory and structure. Springer Science & Business Media, 2007.\n\n[10] David R Cox. Some statistical methods connected with series of events. Journal of the Royal Statistical\n\nSociety: Series B (Methodological), 17(2):129\u2013157, 1955.\n\n[11] J. F. C. Kingman. Poisson processes, volume 3. Clarendon Press, 1992.\n\n[12] Ryan Prescott Adams, Iain Murray, and David JC MacKay. Tractable nonparametric bayesian inference\nin poisson processes with gaussian process intensities. In Proceedings of the 26th Annual International\nConference on Machine Learning, pages 9\u201316. ACM, 2009.\n\n[13] Christian Donner and Manfred Opper. Ef\ufb01cient bayesian inference of sigmoidal gaussian cox processes.\n\nThe Journal of Machine Learning Research, 19(1):2710\u20132743, 2018.\n\n[14] Alan G Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika,\n\n58(1):83\u201390, 1971.\n\n[15] Alan G Hawkes and David Oakes. A cluster process representation of a self-exciting process. Journal of\n\nApplied Probability, 11(3):493\u2013503, 1974.\n\n[16] Xenia Miscouridou, Francois Caron, and Yee Whye Teh. Modelling sparsity, heterogeneity, reciprocity\nIn Advances in Neural Information Processing\n\nand community structure in temporal interaction data.\nSystems 31, pages 2343\u20132352. Curran Associates, Inc., 2018.\n\n[17] Young Lee, Kar Wai Lim, and Cheng Soon Ong. Hawkes processes with stochastic excitations.\n\nInternational Conference on Machine Learning, pages 79\u201388, 2016.\n\nIn\n\n[18] Yichen Wang, Bo Xie, Nan Du, and Le Song. Isotonic hawkes processes. In International conference on\n\nmachine learning, pages 2226\u20132234, 2016.\n\n10\n\n\f[19] Zhengyu Ma, Gina G Turrigiano, Ralf Wessel, and Keith B Hengen. Cortical circuit dynamics are home-\n\nostatically tuned to criticality in vivo. Neuron, 2019.\n\n[20] Arianna Maffei, Sacha B Nelson, and Gina G Turrigiano. Selective recon\ufb01guration of layer 4 visual\n\ncortical circuitry by visual deprivation. Nature neuroscience, 7(12):1353, 2004.\n\n[21] Gianluigi Mongillo, Simon Rumpel, and Yonatan Loewenstein. Inhibitory connectivity de\ufb01nes the realm\n\nof excitatory plasticity. Nature neuroscience, 21(10):1463, 2018.\n\n[22] Hongyuan Mei and Jason M Eisner. The neural hawkes process: A neurally self-modulating multivariate\n\npoint process. In Advances in Neural Information Processing Systems, pages 6754\u20136764, 2017.\n\n[23] Timothy R Darlington, Jeffrey M Beck, and Stephen G Lisberger. Neural implementation of bayesian\n\ninference in a sensorimotor behavior. Nature neuroscience, 21(10):1442, 2018.\n\n[24] Thomas Parr, Geraint Rees, and Karl J Friston. Computational neuropsychology and bayesian inference.\n\nFrontiers in human neuroscience, 12:61, 2018.\n\n[25] Ian Vernon, Junli Liu, Michael Goldstein, James Rowe, Jen Topping, and Keith Lindsey. Bayesian uncer-\ntainty analysis for complex systems biology models: emulation, global parameter searches and evaluation\nof gene functions. BMC systems biology, 12(1):1, 2018.\n\n[26] Robert JH Ross, Ruth E Baker, Andrew Parker, MJ Ford, RL Mort, and CA Yates. Using approximate\nbayesian computation to quantify cell\u2013cell adhesion parameters in a cell migratory process. NPJ systems\nbiology and applications, 3(1):9, 2017.\n\n[27] Hongteng Xu, Dixin Luo, and Hongyuan Zha. Learning hawkes processes from short doubly-censored\nevent sequences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70,\npages 3831\u20133840. JMLR. org, 2017.\n\n[28] Yingxiang Yang, Jalal Etesami, Niao He, and Negar Kiyavash. Online learning for multivariate hawkes\n\nprocesses. In Advances in Neural Information Processing Systems, pages 4937\u20134946, 2017.\n\n[29] R\u00b4emi Lemonnier, Kevin Scaman, and Argyris Kalogeratos. Multivariate hawkes processes for large-scale\n\ninference. In AAAI, 2017.\n\n[30] Hongteng Xu and Hongyuan Zha. A dirichlet mixture model of hawkes processes for event sequence clus-\ntering. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett,\neditors, Advances in Neural Information Processing Systems 30, pages 1354\u20131363. Curran Associates,\nInc., 2017.\n\n[31] Hongteng Xu, Mehrdad Farajtabar, and Hongyuan Zha. Learning granger causality for hawkes processes.\nIn Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Confer-\nence on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1717\u20131726,\nNew York, New York, USA, 2016.\n\n[32] Hongteng Xu, Lawrence Carin, and Hongyuan Zha. Learning registered point processes from idiosyn-\ncratic observations. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International\nConference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 5443\u2013\n5452, 2018.\n\n[33] Wilson Truccolo, Uri T Eden, Matthew R Fellows, John P Donoghue, and Emery N Brown. A point\nprocess framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic\ncovariate effects. Journal of neurophysiology, 93(2):1074\u20131089, 2005.\n\n[34] A Stewart Fotheringham and David WS Wong. The modi\ufb01able areal unit problem in multivariate statis-\n\ntical analysis. Environment and planning A, 23(7):1025\u20131044, 1991.\n\n[35] Felipe Gerhard, Moritz Deger, and Wilson Truccolo. On the stability and dynamics of stochastic spik-\ning neuron models: Nonlinear hawkes process and point process glms. PLoS computational biology,\n13(2):e1005390, 2017.\n\n[36] Izhak Rubin. Regular point processes and their detection. IEEE Transactions on Information Theory,\n\n18(5):547\u2013557, 1972.\n\n[37] Peter A Lewis and Gerald S Shedler. Simulation of nonhomogeneous poisson processes by thinning.\n\nNaval Research Logistics (NRL), 26(3):403\u2013413, 1979.\n\n11\n\n\f[38] Michael E Tipping. Sparse bayesian learning and the relevance vector machine. Journal of machine\n\nlearning research, 1(Jun):211\u2013244, 2001.\n\n[39] Anita C Faul and Michael E Tipping. Analysis of sparse bayesian learning.\n\ninformation processing systems, pages 383\u2013389, 2002.\n\nIn Advances in neural\n\n[40] Jakob Gulddahl Rasmussen. Bayesian inference for hawkes processes. Methodology and Computing in\n\nApplied Probability, 15(3):623\u2013642, 2013.\n\n[41] Nicholas G Polson, James G Scott, and Jesse Windle. Bayesian inference for logistic models using p\u00b4olya\u2013\n\ngamma latent variables. Journal of the American statistical Association, 108(504):1339\u20131349, 2013.\n\n[42] Scott Linderman, Matthew Johnson, and Ryan P Adams. Dependent multinomial models made easy:\nStick-breaking with the p\u00b4olya-gamma augmentation. In Advances in Neural Information Processing Sys-\ntems, pages 3456\u20133464, 2015.\n\n[43] Scott Linderman, Ryan P Adams, and Jonathan W Pillow. Bayesian latent structure discovery from multi-\n\nneuron recordings. In Advances in neural information processing systems, pages 2002\u20132010, 2016.\n\n[44] Scott W Linderman and Ryan P Adams. Scalable bayesian inference for excitatory point process net-\n\nworks. arXiv preprint arXiv:1507.03228, 2015.\n\n[45] Scott Linderman and Ryan Adams. Discovering latent network structure in point process data. In Inter-\n\nnational Conference on Machine Learning, pages 1413\u20131421, 2014.\n\n[46] Robert E Kass, Uri T Eden, and Emery N Brown. Analysis of neural data, volume 491. Springer, 2014.\n\n[47] Yu Chen, Qi Xin, Val\u00b4erie Ventura, and Robert E Kass. Stability of point process spiking neuron models.\n\nJournal of computational neuroscience, 46(1):19\u201332, 2019.\n\n[48] Michael Eichler, Rainer Dahlhaus, and Johannes Dueck. Graphical modeling for multivariate hawkes\n\nprocesses with nonparametric link functions. Journal of Time Series Analysis, 38(2):225\u2013242, 2017.\n\n[49] Emery N Brown, Riccardo Barbieri, Val\u00b4erie Ventura, Robert E Kass, and Loren M Frank. The time-\nrescaling theorem and its application to neural spike train data analysis. Neural computation, 14(2):325\u2013\n346, 2002.\n\n[50] Felipe Gerhard, Robert Haslinger, and Gordon Pipa. Applying the multivariate time-rescaling theorem to\n\nneural population models. Neural computation, 23(6):1452\u20131483, 2011.\n\n[51] Tim Blanche. Multi-neuron recordings in primary visual cortex. http://crcns.org/data-sets/vc/pvc-3,\n\n2016.\n\n[52] Mathieu Guillame-Bert and Artur Dubrawski. Classi\ufb01cation of time sequences using graphs of temporal\n\nconstraints. Journal of Machine Learning Research, 18(121):1\u201334, 2017.\n\n[53] Mathieu Guillame-Bert, Artur Dubrawski, Donghan Wang, Marilyn Hravnak, Gilles Clermont, and\nMichael R Pinsky. Learning temporal rules to forecast instability in continuously monitored patients.\nJournal of the American Medical Informatics Association, 24(1):47\u201353, 2016.\n\n[54] Alexander Gro\u00df, Barbara Kracher, Johann M Kraus, Silke D K\u00a8uhlwein, Astrid S P\ufb01ster, Sebastian Wiese,\nKatrin Luckert, Oliver P\u00a8otz, Thomas Joos, Dries Van Daele, et al. Representing dynamic biological\nnetworks with multi-scale probabilistic models. Communications biology, 2(1):21, 2019.\n\n[55] Quan Wang, Rui Chen, Feixiong Cheng, Qiang Wei, Ying Ji, Hai Yang, Xue Zhong, Ran Tao, Zhexing\nWen, James S Sutcliffe, et al. A bayesian framework that integrates multi-omics data and gene networks\npredicts risk genes from schizophrenia gwas data. Nature neuroscience, page 1, 2019.\n\n[56] Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. Doctor ai:\nPredicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference,\npages 301\u2013318, 2016.\n\n[57] B. Mark, G. Raskutti, and R. Willett. Network estimation from point process data. IEEE Transactions on\n\nInformation Theory, 65(5):2953\u20132975, May 2019.\n\n12\n\n\f", "award": [], "sourceid": 2805, "authors": [{"given_name": "Ifigeneia", "family_name": "Apostolopoulou", "institution": "Carnegie Mellon University"}, {"given_name": "Scott", "family_name": "Linderman", "institution": "Stanford University"}, {"given_name": "Kyle", "family_name": "Miller", "institution": "Carnegie Mellon University"}, {"given_name": "Artur", "family_name": "Dubrawski", "institution": "Carnegie Mellon University"}]}