{"title": "Gaussian process modulated renewal processes", "book": "Advances in Neural Information Processing Systems", "page_first": 2474, "page_last": 2482, "abstract": "Renewal processes are generalizations of the Poisson process on the real line, whose intervals are drawn i.i.d. from some distribution. Modulated renewal processes allow these distributions to vary with time, allowing the introduction nonstationarity. In this work, we take a nonparametric Bayesian approach, modeling this nonstationarity with a Gaussian process. Our approach is based on the idea of uniformization, allowing us to draw exact samples from an otherwise intractable distribution. We develop a novel and efficient MCMC sampler for posterior inference. In our experiments, we test these on a number of synthetic and real datasets.", "full_text": "Gaussian process modulated renewal processes\n\nVinayak Rao\n\nYee Whye Teh\n\nGatsby Computational Neuroscience Unit\n\nGatsby Computational Neuroscience Unit\n\nUniversity College London\n\nvrao@gatsby.ucl.ac.uk\n\nUniversity College London\n\nywteh@gatsby.ucl.ac.uk\n\nAbstract\n\nRenewal processes are generalizations of the Poisson process on the real line\nwhose intervals are drawn i.i.d. from some distribution. Modulated renewal pro-\ncesses allow these interevent distributions to vary with time, allowing the introduc-\ntion of nonstationarity. In this work, we take a nonparametric Bayesian approach,\nmodelling this nonstationarity with a Gaussian process. Our approach is based on\nthe idea of uniformization, which allows us to draw exact samples from an oth-\nerwise intractable distribution. We develop a novel and ef\ufb01cient MCMC sampler\nfor posterior inference. In our experiments, we test these on a number of synthetic\nand real datasets.\n\n1\n\nIntroduction\n\nRenewal processes are stochastic point processes on the real line where intervals between succes-\nsive points (times) are drawn i.i.d.\nfrom some distribution. The simplest example of a renewal\nprocess is the homogeneous Poisson process, whose interevent times are exponentially distributed.\nA limitation of this is the memoryless property of the exponential distribution, resulting in an \u2018as\nbad as old after a repair\u2019 property [1] that is not true of many real-world phenomena. For example,\nimmediately after \ufb01ring, a neuron is depleted of its resources and incapable of \ufb01ring again, and the\ngamma distribution is used to model interspike intervals [2]. Similarly, because of the phenomenon\nof elastic rebound, some time is required to recharge stresses released after an earthquake and an\ninverse Gaussian distribution is used to model intervals between major earthquakes [3]. Other ex-\namples include using the Pareto distribution to better capture the burstiness and self-similarity of\nnetwork traf\ufb01c arrival times [4], and the Erlang distribution to model the fact that buying incidence\nof frequently purchased goods is less variable than Poisson [5].\nModelling interevent times as i.i.d. draws from a general renewal density can allow larger or smaller\nvariances than an exponential with the same mean (overdispersion or underdispersion), but effec-\ntively encodes an \u2018as good as new after a repair\u2019 property. Again, this is often only an approximation:\nbecause of age or other time-varying factors, the interevent distribution of the point process may vary\nwith time. For instance, internet traf\ufb01c can vary with time of the day, day of the week and in re-\nsponse to advertising and seasonal trends. Similarly, an external stimulus can modulate the \ufb01ring\nrate of the neuron, economic trends can modulate \ufb01nancial transactions etc. The most popular way\nof modelling this nonstationarity is via an inhomogeneous Poisson process whose intensity function\ndetermines the instantaneous event rate, and there has also been substantial work extending this to\nrenewal processes in various ways (see section 2.2).\nIn this paper, we describe a nonparametric Bayesian approach where a renewal process is modulated\nby a random intensity function which is given a Gaussian process prior. Our approach extends\nwork by [6] on the Poisson process, using a generalization of the idea of Poisson thinning called\nuniformization [7] to draw exact samples from the model. We extend recent ideas from [8] to\ndevelop a more natural and ef\ufb01cient block Gibbs sampler than the incremental Metropolis-Hastings\nalgorithm used in [6]. In our experiments we demonstrate the usefulness of our model and sampler\non a number of synthetic and real-world datasets.\n\n1\n\n\f2 Modulated renewal processes\nConsider a renewal process R over an interval [0, T ] whose interevent time is distributed according\nto a renewal density g. Let G = {G1, G2, ...} be the ordered set of event times sampled from this\nrenewal process, i.e.\n\n(1)\nFor simplicity1 we place a starting event G0 at time 0, so for each i \u2265 1 we have (Gi \u2212 Gi\u22121) \u223c g.\nAssociated with the renewal density g is a hazard function h, where h(\u03c4 )\u2206, for in\ufb01nitesimal \u2206 > 0,\nis the probability of the interevent interval being in [\u03c4, \u03c4 + \u2206] conditioned on it being at least \u03c4, i.e.\n\nG \u223c R(g)\n\n1 \u2212(cid:82) \u03c4\n\ng(\u03c4 )\n0 g(u)du\n\nh(\u03c4 ) =\n\n(2)\n\n(cid:18)\n\n(cid:90) \u03c4\n\n(cid:19)\n\nLet \u03bb(t) be some time-varying intensity function. A simple way to introduce nonstationarity into\na renewal process is to modulate the hazard function by \u03bb(t) so that it depends on both the time \u03c4\nsince the last event, and on the absolute time t [9, 10]:\n\nh(\u03c4, t) \u2261 m(h(\u03c4 ), \u03bb(t))\n\n(3)\nwhere m(\u00b7,\u00b7) is some interaction function. Examples include additive (h(\u03c4 ) + \u03bb(t)) and multi-\nplicative (h(\u03c4 )\u03bb(t)) interactions. For concreteness, we assume multiplicative interactions in what\nfollows, however our results extend easily to general interaction functions.\nWith a modulated hazard rate, the distribution of interevent times is no longer stationary. Instead,\nplugging a multiplicative interaction into (2) and solving for g (see the supplementary material for\ndetails), we get\n\ng(\u03c4|tprev) = \u03bb(tprev + \u03c4 )h(\u03c4 ) exp\n\n\u2212\n\n\u03bb(tprev + u)h(u)du\n\n(4)\n\nwhere tprev is the previous event time. Observe that equation (4) encompasses the inhomogeneous\nPoisson process as a special case (a constant hazard function with multiplicative modulation).\n2.1 Gaussian process intensity functions\n\n0\n\nIn this paper we are interested in estimating both parameters of the hazard function h(\u03c4 ) as well as\nthe intensity function \u03bb(t) itself. Taking a Bayesian nonparametric approach, we model \u03bb(t) using\na Gaussian process (GP) [11] prior, which has support over a rich class of functions and offers a\n\ufb02exibility not afforded by parametric approaches. We call the resulting model a Gaussian process\nmodulated renewal process. A minor issue is that samples from a GP can take negative values; we\naddress this using a sigmoidal link function. Finally, we use a gamma family for the hazard function:\nh(\u03c4 ) =\n\n(cid:82) \u221e\nx\u03b3\u22121e\u2212\u03b3x\nx u\u03b3\u22121e\u2212\u03b3udu where \u03b3 is the shape parameter2. Our complete model is thus\n\nl(\u00b7) \u223c GP(\u00b5, K),\n\n\u03bb(\u00b7) = \u03bb\u2217\u03c3(l(\u00b7)),\n\n(5)\nwhere \u00b5 and K are the GP mean and covariance kernel, \u03bb\u2217 is a positive scale parameter, and\n\u03c3(x) = (1 + exp(\u2212x))\u22121. We place a gamma hyperprior on \u03bb\u2217 as well as hyperpriors on the\nGP hyperparameters.\n2.2 Related work\n\nG \u223c R(\u03bb(\u00b7), h(\u00b7))\n\nThe idea of de\ufb01ning a nonstationary renewal process by modulating the hazard function dates back\nto Cox [9]. Early work [12] focussed on hypothesis testing for the stationarity assumption. [13,\n14, 1] proposed parametric (generalized linear) models where the intensity function was a linear\ncombination of some known functions; these regression coef\ufb01cients were estimated via maximum\nlikelihood.\n[15] considers general modulated hazard functions as well; however they assume it\nhas known form and are concerned with calculating statistical properties of the resulting process.\n\n1With renewal processes there is an ambiguity about the time of the \ufb01rst event, which is typically taken to\n\nbe exponentially distributed. It is straightforward to handle this case.\n\n2We parametrize the hazard function to produce 1 event per unit time; other parametrizations may be used.\n\n2\n\n\fFinally, [10] describe a model that is a generalization of ours, but again have to resort to maximum\nlikelihood estimation (our ideas can easily be extended to their more general model too).\nA different approach to producing inhomogeneity is by \ufb01rst sampling from a homogeneous renewal\nprocess and then rescaling time [16, 17]. The trend renewal process [18] uses such an approach, and\nthe authors propose an iterative kernel smoothing scheme to approximate a maximum likelihood\nestimate of the intensity function. [2] uses time-rescaling to introduce inhomogeneity and, similar\nto us, a Gaussian process prior for the intensity function. Unlike us, they had to discretize time and\nused a variational approach to inference.\nFinally, we note that our approach generalizes [6], who describe a doubly stochastic Poisson process\nand an MCMC sampler which does not require time discretization. In the next sections we describe\na generalization of their model to the inhomogeneous renewal process using a twist on a classical\nidea called uniformization.\n3 Sampling via Uniformization\n\nBefore we consider Markov chain Monte Carlo (MCMC) inference for our model, observe that\nto even na\u00a8\u0131vely generate samples from the prior is dif\ufb01cult; this requires evaluating integrals of a\ncontinuous-time function drawn from a GP (see equation (4)). One approach is to evaluate these\nintegrals numerically by discretizing time [2], which can be time consuming and introduce approx-\nimation errors. In section 3.2 we will show how a classical idea called uniformization allows us to\nef\ufb01ciently draw exact samples from the model, without approximations due to discretization. Then\nin section 4 we will develop a novel MCMC algorithm based on uniformization.\n3.1 Modulated Poisson processes\n\nWe start with thinning, a well-known result to sample from an inhomogeneous Poisson process with\nintensity \u03bb(t). Suppose that \u03bb(t) is upper bounded by some constant \u2126. Let E be a set of locations\nsampled from a homogeneous Poisson process with rate \u2126. We thin this set by deleting each point\ne \u2208 E independently with probability 1 \u2212 \u03bb(e)\nProposition 1 ([19]). The set F is a draw from a Poisson process with intensity function \u03bb(t) .\n\n\u2126 . Let F be the remaining set of points. Then:\n\n3.2 Modulated renewal processes\n\nLess well-known is a generalization of this result to renewal processes [13]. Note that the thinning\nresult of the previous section builds on the memoryless property of the exponential distribution (or\nthe complete randomness [20] of the Poisson process): events in disjoint sets occur independently\nof each other. For a renewal process, events are no longer independent of their neighbours. This\nsuggests a generalization of thinning involving a Markov chain over the set of events. This idea of\nthinning a Poisson process by a subordinated Markov chain is called uniformization [7].\n[21] describes a uniformization scheme to sample from a homogeneous renewal process. We extend\nit to the modulated case here. We will assume that both the intensity function \u03bb(t) and the hazard\nfunction h(\u03c4 ) are bounded, so that there exists a constant \u2126 such that\n\n\u2126 \u2265 max\n\nt,\u03c4\n\nh(\u03c4 )\u03bb(t)\n\n(6)\n\nNote that because of the sigmoidal link function, our model has \u03bb(t) \u2264 \u03bb\u2217, while the gamma hazard\nh(\u03c4 ) is bounded by the shape parameter \u03b3 if \u03b3 \u2265 1. We now sample a set of times E = {E0 =\n0, E1, E2, . . .} from a homogeneous Poisson process with rate \u2126 and thin this set by running a\ndiscrete time Markov chain on the times in E. Let Y0 = 0, Y1, Y2, . . . be an integer-valued Markov\nchain, where each Yi either equals Yi\u22121 or i. We interpret Yi as indicating the index of the last\nunthinned event prior or equal to Ei. That is, Yi = Yi\u22121 means that Ei is thinned, and Yi = i means\nEi is not thinned. Note that Ei \u2212 EYi gives the time since the last unthinned event. For i > j \u2265 0,\nde\ufb01ne the transition probabilities of the Markov chain (conditioned on E) as follows,\n\np(Yi = i|Yi\u22121 = j) =\n\nh(Ei \u2212 Ej)\u03bb(Ei)\n\n\u2126\n\np(Yi = j|Yi\u22121 = j) = 1 \u2212 h(Ei \u2212 Ej)\u03bb(Ei)\n\n,\n\n(7)\n\n\u2126\n\nAfter drawing a sample from Y , we de\ufb01ne F = {Ei \u2208 E s.t. Yi = i}.\n\n3\n\n\fProposition 2. For any \u2126 \u2265 maxt,\u03c4 h(\u03c4 )\u03bb(t), F is a sample from a modulated renewal process\nwith hazard h(\u00b7) and modulating intensity \u03bb(\u00b7).\nThe proof of this is included in the supplementary material. The basic idea is to write down the\nprobability p(E, Y ) of the whole generative process and marginalize out the thinned times, showing\nthat the resulting interevent time is simply (4). For a different proof of a similar result, see [13].\nNow recall that we have a GP prior for l(\u00b7). The uniformization procedure above only requires the\nintensity function evaluated at the times in E (which is \ufb01nite on a \ufb01nite interval), and this is easily\nobtained by sampling from a \ufb01nite dimensional Gaussian N (\u00b5E, KE), with mean and covariance\nbeing the corresponding GP parameters \u00b5 and K evaluated at E. Our procedure to sample from a\nGP-modulated renewal process now follows: sample from a homogeneous Poisson process P(\u2126)\non [0, T ], instantiate the GP on this \ufb01nite set of points and then thin the set by running the Markov\nchain described previously. De\ufb01ning lE as l(t) evaluated on the set E, E\u2217\ni as the restriction of E to\nthe interval (Fi\u22121, Fi), and Fi+1 = T we can write the joint distribution:\nP (F, l, E) = \u2126|E|e\u2212\u2126TN (lE|\u00b5E, KE)\n\n(cid:16)\u03bb(Fi)h(Fi\u2212Fi\u22121)\n|F|(cid:89)\n\n1 \u2212 \u03bb(e)h(e\u2212Fi\u22121)\n\n(cid:17)|F|+1(cid:89)\n\n(cid:89)\n\n(cid:16)\n\n(cid:17)\n\n(8)\n\n\u2126\n\n\u2126\n\n4\n\nInference\n\ni=1\n\ni=1\n\ne\u2208E\u2217\n\ni\n\nWe now consider posterior inference on the modulating function \u03bb(t) (and any unknown hyperpa-\nrameters) given an observed set of event times G. Our sampling algorithm is based on ideas devel-\noped in [8]. We imagine G was generated via uniformization, so that there exists an unobserved set\nof thinned events \u02dcG. We then proceed by Markov chain Monte Carlo, setting up a Markov chain\nwhose state consists of the number and locations of \u02dcG, the values of the GP on the set G\u222a \u02dcG as well\nas the current sampled hyperparameters. Note from equation (8) that given these values, the value of\nthe modulating function at any other location is independent of the observations and can be sampled\nfrom the conditional distribution of a multivariate Gaussian.\nThe challenge now is to construct a transition operator that results in this Markov chain having\nthe desired posterior distribution as its equilibrium distribution. In their work, [6] de\ufb01ned a transi-\ntion operator by proposing insertions and deletions of thinned events as well as by perturbing their\nlocations. The proposals were accepted or rejected using a Metropolis-Hastings correction. The\nremaining variables were updated using standard Gaussian process techniques. We show below that\ninstead of incrementally updating \u02dcG, it is actually possible to produce a new independent sample of\nthe entire set \u02dcG (conditioned on all other variables). This leads to a more natural sampler that does\nnot require any external tuning and that mixes more rapidly.\nTo understand our algorithm, suppose \ufb01rst that the modulating function \u03bb(t) is known for all t.\nThen, from (4), the probability of the set of events G on the interval [0, T ] is3:\n\nP (G|\u03bb(t)) =\n\n\u03bb(Gi)h(Gi \u2212 Gi\u22121)\n\nexp\n\n\u2212\n\n\u03bb(t)h(t \u2212 Gi\u22121)dt\n\n(9)\n\n|G|+1(cid:89)\n\n(cid:32)\n\n(cid:90) Gi\n\ni=1\n\nGi\u22121\n\n|G|(cid:89)\n\ni=1\n\nNow, suppose that in each consecutive interval (Gi\u22121, Gi) we independently sample a set of events\ni from an inhomogeneous Poisson process with rate (\u2126 \u2212 \u03bb(t)h(t \u2212 Gi\u22121)), and let \u02dcG = \u222a \u02dcG\u2217\n\u02dcG\u2217\ni .\nA little algebra shows that:\n\nP ( \u02dcG, G|\u03bb(t)) =\n\ndt (\u2126 \u2212 \u03bb(t)h(t \u2212 Gi\u22121)\n\n(\u2126 \u2212 \u03bb(\u02dcg)h(\u02dcg \u2212 Gi\u22121))\n\n(cid:33) (cid:89)\n\n\u02dcg\u2208 \u02dcG\u2217\n\ni\n\n(cid:90) Gi\n\n(cid:32)\n\n\u2212\n\n\uf8eb\uf8edexp\n|G|+1(cid:89)\n|G|(cid:89)\n\ni=1\n\n\u00d7\n\n\u03bb(Gi)h(Gi \u2212 Gi\u22121)\n\nGi\u22121\n\n|G|+1(cid:89)\n(cid:18) \u03bb(Gi)h(Gi \u2212 Gi\u22121)\n\ni=1\n\nexp\n\n(cid:32)\n(cid:90) Gi\n(cid:19) |G|+1(cid:89)\n(cid:89)\n\nGi\u22121\n\n\u2212\n\n|G|(cid:89)\n\ni=1\n\n\u2126\n\ni=1\n\n\u02dcg\u2208 \u02dcG\u2217\n\ni\n\n(cid:18)\n\n\u03bb(t)h(t \u2212 Gi\u22121)dt\n\n1 \u2212 \u03bb(\u02dcg)h(\u02dcg \u2212 Gi\u22121)\n\n\u2126\n\n(cid:33)\n\n(cid:33)\n\n\uf8f6\uf8f8\n\n(cid:19)\n\n(10)\n\n(11)\n\ni=1\n\n= \u2126|G|+| \u02dcG| exp (\u2212\u2126T )\n\n3Recall that G0 = 0. We also take G|G|+1 = T .\n\n4\n\n\fComparing with equation (8), we have the following proposition:\nProposition 3. The sets (E, F ) and (G \u222a \u02dcG, G) are equivalent i.e. they have the same distribution.\n\nIn other words, given a set of event times G, the inhomogeneous Poisson process-distributed points\n\u02dcG can be taken to be the events thinned in the procedure of section 3.2. The only complication left\nis that we do not know the function \u03bb(t) everywhere. This is easily overcome by uniformization (in\nfact, just by thinning, since we\u2019re dealing with a Poisson process). Speci\ufb01cally, let G be the set of\nobserved events and \u02dcGprev the previous set of thinned events. To sample the new set \u02dcG\u2217\ni from the\nPoisson process on [Gi\u22121, Gi] with rate (\u2126 \u2212 \u03bb(t)h(t \u2212 Gi\u22121)), we \ufb01rst sample a set of points A\nfrom a homogeneous Poisson process on [Gi\u22121, Gi] with rate \u2126 and instantiate the Gaussian process\non those points, conditioned on G \u222a \u02dcGprev and lG\u222a \u02dcGprev\n(note that all this involves is conditionally\nsampling from a multivariate Gaussian4). Finally, we keep a \u2208 A with probability 1\u2212 \u03bb(a)h(a\u2212Gi\u22121)\n.\nHaving resampled \u02dcG (and the associated set of GP values), we next must resample the value of\nthe GP at G. This does involve the sigmoid likelihood function, and we proceed by elliptical slice\nsampling [22] 5. Algorithm 1 lists the steps involved.\n\n\u2126\n\nAlgorithm 1 Blocked Gibbs sampler for GP-modulated renewal process on the interval [0, T ]\nInput: Set of event times G, set of thinned times \u02dcGprev and l instantiated at G \u222a \u02dcGprev.\nOutput: A new set of thinned times \u02dcGnew and a new instantiation lG\u222a \u02dcGnew\n1: Sample A \u2282 [0, T ] from a Poisson process with rate \u2126.\n2: Sample lA|lG\u222a \u02dcGprev\n3: Thin A, keeping element a \u2208 A \u2229 [Gi\u22121, Gi] with probability\n4: Let \u02dcGnew be the resulting set and l \u02dcGnew\n\n1 \u2212 \u03bb\u2217\u03c3(l(a))h(a\u2212Gi\u22121)\n\nbe the restriction of lA to this set. Discard \u02dcGprev and\n\nof the GP on G \u222a \u02dcGnew.\n\n(cid:17)\n\n.\n\n.\n\n(cid:16)\n\n\u2126\n\nl \u02dcGprev\n\n.\n\n5: Resample lG\u222a \u02dcGnew\n\nusing, for example, elliptical slice sampling.\n\nThe gamma prior on \u03bb\u2217 is conjugate to the Poisson, resulting in a gamma posterior. We resampled\nthe GP hyperparameters using slice sampling [23] 5, while parameters of the hazard function were\nupdated using Metropolis-Hastings moves along with equation (8).\n4.1 Computational considerations\n\nThe inferential bottleneck in our model is the Gaussian process: sampling a GP on a set of points is,\nin the worst case, cubic in the size of that set. In our model, each iteration sees on average |G|+2|E|\nvalues of the GP, where |G| is the number of observations and |E| is the average number of points\nsampled from the subordinating Poisson process. Note that |E| varies from iteration to iteration\n(being proportional to the scaling factor \u03bb\u2217). Since we perform posterior inference on this quantity,\nthe complexity of our model can be thought to adapt to that of the problem. This is in contrast with\ntime-discretization approaches, where a resolution is picked beforehand, \ufb01xing the complexity of the\ninference problem accordingly. For instance, [2] use a resolution of 1ms to model neural spiking,\nmaking it impossible to na\u00a8\u0131vely deal with spike trains extending over more than a second. However\nas they demonstrate in their work, instantiating a GP on a regular lattice allows the development\nof fast approximate inference algorithms that scale linearly with the number of grid-points. In our\ncase, the Gaussian processes is sampled at random locations. Moreover, these locations change each\niteration, requiring the inversion of a new covariance matrix; this is the price we have to pay for an\nexact sampler.\nOne approach is to try reduce the number of thinned events |E|. Recall that our generative approach\nis to thin a sample from a subordinating, homogeneous Poisson process whose rate upper bounds\nthe modulated hazard rate. We can reduce the number of thinned events by subordinating to an\ninhomogeneous Poisson process, one whose rate more closely resembles the instantaneous hazard\nrate. Thus, instead of using a single constant \u03bb\u2217, one could use (say) a piecewise linear function\n\n4In particular, it does not require any sophisticated GP sampling algorithm\n5Code available on Iain Murray\u2019s website: http://homepages.inf.ed.ac.uk/imurray2/\n\n5\n\n\f\u03bb\u2217(t) The more segments we use, the more \ufb02exibility we have; the price being the complexity of\nresampling this function, and slower mixing because of correlations it introduces.\nThis however does not help if G, the number of observations itself is large. In such a situation one\nhas to call upon the vast literature concerning approximate inference for Gaussian processes [11].\nThe question then is how these approximation compare with those like [2]. We believe this is an\ninteresting question in its own right, and raises the possibility of approximate inference algorithms\nthat combine ideas from [2] with the adaptive nature of our approach.\n5 Experiments\n\nIn this section we evaluate our model and sampler on a number of datasets. We used gamma dis-\ntributed interevent times with shape parameter \u03b3 \u2265 1. When \u03b3 = 1, we recover the Poisson process,\nand our model reduces to that of [6], while \u03b3 > 1 models \u2018refractoriness\u2019, where two events in quick\nsuccession are less likely than under a Poisson process. When appropriate, we place a noninforma-\ntive prior on the shape parameter: an exponential with rate 0.1 shifted to have a minimum value of\n1. Note that for shape parameters less than 1, the renewal process becomes \u2018bursty\u2019 and the hazard\nfunction becomes unbounded. This is an interesting scenario but beyond the scope of this paper.\nAn interesting issue concerns the identi\ufb01ability of the shape parameter under our model. We \ufb01nd\nfrom our experiments that this is only a problem when the length scale of the intensity function is\ncomparable to the refractory period of the renewal process. The base rate of the modulated renewal\nprocess (i.e. the rate when the intensity function is \ufb01xed at 1) is set to the empirical rate of the ob-\nserved point process. As a result the identi\ufb01ability of the shape parameter is a consequence of the\ndispersion of the point process rather than of some sort of rate matching.\nSynthetic data. Our \ufb01rst set of experiments uses three synthetic datasets generated by modulating\na gamma renewal process (shape parameter \u03b3 = 3) with three different functions (see \ufb01gure 1):\n\n\u2022 \u03bb1(t) = 2 exp(t/5) + exp(\u2212((t \u2212 25)/10)2,\n\u2022 \u03bb2(t) = 5 sin(t2) + 6,\nt \u2208 [0, 5]: 12 events\n\u2022 \u03bb3(t): a piecewise linear function ,\n\nt \u2208 [0, 100]: 153 events\n\nt \u2208 [0, 50]: 44 events\n\nAdditionally, for each function, we also generated 10 test sets. We ran three settings of our model:\nwith the shape parameter \ufb01xed to 1 (MRP Exp), with the shape parameter \ufb01xed to the truth (MRP\nGam3), and with a hyperprior on the shape parameter (MRP Full). For comparison, we also ran\nan approximate discrete-time sampler where the Gaussian process was instantiated on a regular grid\ncovering the interval of interest. In this case, all intractable integrals were approximated numerically\nand we use elliptical slice sampling to run MCMC on this Gaussian vector.\nFigure 1 shows the results from 5000 MCMC samples after a burn-in of 1000 samples. We quantify\nthese in Table 1 by calculating the l2 distance of the posterior means from the truth. We also calcu-\nlated the mean predictive probabilities of the 10 test sequences. Not surprisingly, the inhomogeneous\nPoisson process forms a poor approximation to the gamma renewal process; it underestimates the\nintensity function required to produce a sequence of events with refractory intervals. Fixing the\nshape parameter to the truth signi\ufb01cantly reduces the l2 error and increases the predictive probabil-\nities, but interestingly, for these datasets, the model with a prior on the shape parameter performs\ncomparably with the \u2018oracle\u2019 model. We have also included plots of the posterior distribution over\nthe gamma parameter; these are peaked around 3. Discretizing time into a 100 bins (Disc100) results\nin comparable performance for the \ufb01rst two datasets on the l2 error; for the third, (which spans a\nlonger interval and has a larger event count), we had to increase the resolution to 500 bins to improve\naccuracy. Discretizing to 25 bins was never suf\ufb01cient. A conclusion is that with time discretization,\nfor a small bias, one must be conservative in choosing the time-resolution; however, evaluating a GP\non a \ufb01ne grid can result in slow mixing. Our sampler has the advantage of automatically picking the\n\u2018right\u2019 resolution. However as we discussed in the section on computation, time discretization has\nits own advantages that make it a viable model [2].\nCoal mine disaster data. For our next experiment, we ran our model on the coal mine disaster\ndataset commonly used in the point process literature. This dataset records the dates of a series\nof 191 coal mining disasters, each of which killed ten or more men [24]. Figure 2(left) shows the\nposterior mean of the intensity function (surrounded by 1 standard deviation) returned by our model.\nNot included is the posterior distribution over the shape parameter; this concentrated in the interval\n1 to around 1.1, suggesting that the data is well modelled as an inhomogeneous Poisson process, and\n\n6\n\n\fFigure 1: Synthetic Datasets 1-3: Posterior mean intensities plotted against time (top) and gamma\nshape posteriors (bottom)\n\nMRP Exp MRP Gam3 MRP Full\n\nDisc25\n4.089003\n-41.646350\n91.321069\n-5.245478\n122.335151\n87.170034\n\nDisc100\n2.426973\n-41.016425\n57.896300\n-3.848443\n38.047332\n-55.802997\n\nl2 error\n\nlog pred. prob.\n\nl2 error\n\nlog pred. prob.\n\nl2 error\n\nlog pred. prob.\n\n7.8458\n-47.5469\n141.0067\n-3.704396\n82.0289\n-89.8787\n\n3.19\n\n-38.0703\n56.2183\n-2.945298\n11.4167\n-48.2777\n\n2.548\n\n-37.3712\n58.4361\n-3.280871\n13.4441\n-48.57\n\nTable 1: l2 distance from the truth and mean log-predictive probabilities of the held-out datasets for\nsynthetic datasets 1(top) to 3(bottom).\n\nis in agreement with [24]. As sanity check, and to shed further light on the issue of identi\ufb01ability, we\nprocessed the dataset by deleting every alternate event. With such a transformation, a homogeneous\nPoisson would reduce to a gamma renewal process with shape 2. Our model returns a posterior\npeaked around 1.5 (in agreement with the form of the inhomogeneity). Note that the posteriors over\nintensity functions are similar (except for the obvious scaling factor of about 2).\nSpike timing data We next ran our model on neural spike train data recorded from grasshopper\nauditory receptor cells [25]. This dataset is characterized by a relatively high \ufb01ring rate (\u223c 150\n\nFigure 2: Left: Posterior mean intensity for coal mine data with 1 standard deviation error bars\n(plotted against time in years). Centre: Posterior mean intensity for \u2018thinned\u2019 coalmine data with 1\nstandard deviation error bars. Right: Gamma shape posterior for \u2018thinned\u2019 coal mine data.\n\n7\n\n01020304050\u22120.500.511.522.53Intensity TruthMRP ExpMRP Gam3MRP FullDisc100012345\u2212202468101202040608010001231234500.050.10.150.21234500.050.10.150.21234500.10.20.30.4185019001950\u2212101234Intensity185019001950\u221210123411.5200.050.10.150.2\fFigure 3: Left: Posterior mean\nintensity for neural data with\n1 standard deviation error\nbars. Superimposed is the log\nstimulus (scaled and shifted).\nPosterior over\nRight:\nthe\ngamma shape parameter.\n\nMean ESS\n93.45 \u00b1 6.91\n56.37 \u00b1 10.30\n\nSynthetic dataset 1\nMinimum ESS\n50.94 \u00b1 5.21\n19.34 \u00b1 11.55\n\nGibbs\nMH\n\nTime(sec)\n\n77.85\n345.44\n\nMean ESS\n53.54 \u00b1 8.15\n47.83 \u00b1 9.18\n\nCoalmine dataset\nMinimum ESS\n24.87 \u00b1 7.38\n18.91 \u00b1 6.45\n\nTime(sec)\n282.72\n1703\n\nTable 2: Sampler comparisons. Numbers are per 1000 samples.\n\nHz), making refractory effects more prominent. We plot the posterior distribution over the intensity\nfunction given a sequence of 200 spikes in a 1.6 second interval. We also included the posterior dis-\ntribution over gamma shape parameters in \ufb01gure 3; this concentrates around 1.5, agreeing with the\nrefractory nature of neuronal \ufb01ring. The results above follow from using noninformative hyperpri-\nors; we have also plotted the log-transformed stimulus, an amplitude-modulated signal. In practice,\nother available knowledge (viz. the shape parameter, the stimulus length-scale, the transformation\nfrom the stimulus to the input of the neuron etc) can be used to make more accurate inferences.\nComputational ef\ufb01ciency and mixing. For our \ufb01nal experiment, we compare our proposed blocked\nGibbs sampler with the Metropolis-Hastings sampler of [6]. We ran both algorithms on two datasets,\nsynthetic dataset 1 from section 5 and the coal mine disaster dataset. All involved 20 MCMC runs\nwith 5000 iterations each (following a burn-in period of a 1000 iterations). For both datasets, we\nevaluated the latent GP on a uniform grid of 200 points, calculating the effective sample size (ESS)\nof each component of the Gaussian vectors (using R-CODA [26]). For each run, we return the\nmean and the minimum ESS across all 200 components. In Table 2, we report these numbers: not\nonly does our sampler mix faster (resulting in larger ESSs), but also takes less computation time.\nAdditionally, our sampler is simpler and more natural to the problem, and does not require any\nexternal tuning.\n\n6 Discussion\n\nWe have described how to produce exact samples from a nonstationary renewal process whose haz-\nard function is modulated by a Gaussian process. Our scheme is based on the idea of uniformization,\nand using this idea, we also develop a novel MCMC sampler. There are a number of interesting av-\nenues worth following. First is the restriction that the hazard function be bounded: while this covers\na large and useful class of renewal processes, it is worth considering how our approach can be\nextended to produce exact or approximate samples for renewal processes with unbounded hazard\nfunctions. In any case, following [13], it is easy to extend our ideas to Bayesian inference for more\ngeneral point processes. Because of the latent Gaussian process, our approach will not scale well\nto large problems; however there is a vast literature concerning approximate sampling for Gaussian\nprocesses. An important question is how these approximations compare to approximations intro-\nduced via time-discretization. Finally, even though we considered GP modulating functions, our\nuniformization-based sampler will also be useful for Bayesian inference involving simpler priors on\nmodulating functions, eg. splines or Markov jump processes.\n\nAcknowledgements\n\nWe thank the Gatsby Charitable Foundation for generous funding. We thank Ryan Adams and Iain\nMurray for code and comments; and Jakob Macke and Lars Buesing for useful discussions. The\ngrasshopper data was collected by Ariel Rokem at Andreas Herz\u2019s lab and provided through the\nCRCNS program (http://crcns.org).\n\n8\n\n0500100015001.522.53Time (ms)11.5200.10.2\fReferences\n[1] J. F. Lawless and K. Thiagarajah. A point-process model incorporating renewals and time trends, with\n\napplication to repairable systems. Technometrics, 38(2):131\u2013138, 1996.\n\n[2] John P. Cunningham, Byron M. Yu, Krishna V. Shenoy, and Maneesh Sahani. Inferring neural \ufb01ring rates\nfrom spike trains using Gaussian processes. In Advances in Neural Information Processing Systems 20,\n2008.\n\n[3] T. Parsons. Earthquake recurrence on the south Hayward fault is most consistent with a time dependent,\n\nrenewal process. Geophysical Research Letters, 35, 2008.\n\n[4] V. Paxson and S. Floyd. Wide area traf\ufb01c: the failure of Poisson modeling. IEEE/ACM Transactions on\n\nNetworking, 3(3):226\u2013244, June 1995.\n\n[5] C. Wu. Counting your customers: Compounding customer\u2019s in-store decisions, interpurchase time and\n\nrepurchasing behavior. European Journal of Operational Research, 127(1):109\u2013119, November 2000.\n\n[6] Ryan P. Adams, Iain Murray, and David J. C. MacKay. Tractable nonparametric Bayesian inference in\nPoisson processes with Gaussian process intensities. In Proceedings of the 26th International Conference\non Machine Learning (ICML), 2009.\n\n[7] A. Jensen. Markoff chains as an aid in the study of Markoff processes. Skand. Aktuarietiedskr., 36:87\u201391,\n\n1953.\n\n[8] V. Rao and Y. W. Teh. Fast MCMC sampling for Markov jump processes and continuous time Bayesian\nnetworks. In Proceedings of the International Conference on Uncertainty in Arti\ufb01cial Intelligence, 2011.\n[9] D.R. Cox. The statistical analysis of dependencies in point processes. In P.A. Lewis, editor, Stochastic\n\npoint processes, pages 55\u201356. New York: Wiley 1972, 1972.\n\n[10] Robert E. Kass and Val\u00b4erie Ventura. A spike-train probability model. Neural Computation, 13(8):1713\u2013\n\n1720, 2001.\n\n[11] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.\n[12] M. Berman. Inhomogeneous and modulated gamma processes. Biometrika, 68(1):143, 1981.\n[13] Yosihiko Ogata. On Lewis\u2019 simulation method for point processes. IEEE Transactions on Information\n\nTheory, 27(1):23\u201331, 1981.\n\n[14] Mark Berman and T. Rolf Turner. Approximating point process likelihoods with GLIM. Journal of the\n\nRoyal Statistical Society. Series C (Applied Statistics), 41(1):pp. 31\u201338, 1992.\n\n[15] I. Sahin. A generalization of renewal processes. Operations Research Letters, 13(4):259\u2013263, May 1993.\n[16] Emery N. Brown, Riccardo Barbieri, Val\u00b4erie Ventura, Robert E. Kass, and Loren M. Frank. The time-\nrescaling theorem and its application to neural spike train data analysis. Neural computation, 14(2):325\u2013\n46, February 2002.\n\n[17] I. Gerhardt and B. L. Nelson. Transforming renewal processes for simulation of nonstationary arrival\n\nprocesses. INFORMS Journal on Computing, 21(4):630\u2013640, April 2009.\n\n[18] Bo Henry Lindqvist. Nonparametric estimation of time trend for repairable systems data. In V.V. Rykov,\nN. Balakrishnan, and M.S. Nikulin, editors, Mathematical and Statistical Models and Methods in Relia-\nbility, Statistics for Industry and Technology, pages 277\u2013288. Birkhuser Boston, 2011.\n\n[19] P. A. W. Lewis and G. S. Shedler. Simulation of nonhomogeneous Poisson processes with degree-two\n\nexponential polynomial rate function. Operations Research, 27(5):1026\u20131040, September 1979.\n\n[20] J. F. C. Kingman. Poisson processes, volume 3 of Oxford Studies in Probability. The Clarendon Press\n\nOxford University Press, New York, 1993. Oxford Science Publications.\n\n[21] J George Shanthikumar. Uniformization and hybrid simulation/analytic models of renewal processes.\n\nOper. Res., 34:573\u2013580, July 1986.\n\n[22] Iain Murray, Ryan Prescott Adams, and David J.C. MacKay. Elliptical slice sampling. JMLR: W&CP, 9,\n\n2010.\n\n[23] Iain Murray and Ryan Prescott Adams. Slice sampling covariance hyperparameters of latent Gaussian\n\nmodels. In Advances in Neural Information Processing Systems 23, 2010.\n\n[24] B. Y. R. G. Jarrett. A note on the intervals between coal-mining disasters. Biometrika, 66(1):191\u2013193,\n\n1979.\n\n[25] Ariel Rokem, Sebastian Watzl, Tim Gollisch, Martin Stemmler, and Andreas V.M. Herz. Spike-Timing\nPrecision Underlies the Coding Ef\ufb01ciency of Auditory Receptor Neurons. Journal of Neurophysiology,\npages 2541\u20132552, 2006.\n\n[26] Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. CODA: Convergence diagnosis and output\n\nanalysis for MCMC. R News, 6(1):7\u201311, March 2006.\n\n9\n\n\f", "award": [], "sourceid": 1333, "authors": [{"given_name": "Yee", "family_name": "Teh", "institution": null}, {"given_name": "Vinayak", "family_name": "Rao", "institution": null}]}