{"title": "Prediction of Spatial Point Processes: Regularized Method with Out-of-Sample Guarantees", "book": "Advances in Neural Information Processing Systems", "page_first": 11942, "page_last": 11951, "abstract": "A spatial point process can be characterized by an intensity function which predicts the number of events that occur across space. In this paper, we develop a method to infer predictive intensity intervals by learning a spatial model using a regularized criterion. We prove that the proposed method exhibits out-of-sample prediction performance guarantees which, unlike standard estimators, are valid even when the spatial model is misspecified. The method is demonstrated using synthetic as well as real spatial data.", "full_text": "Prediction of Spatial Point Processes:\n\nRegularized Method with Out-of-Sample Guarantees\n\nMuhammad Osama\u02da\n\nmuhammad.osama@it.uu.se\n\nDave Zachariah\u02da\n\ndave.zachariah@it.uu.se\n\nPeter Stoica\u02da\n\npeter.stoica@it.uu.se\n\n*Division of System and Control, Department of Information Technology, Uppsala University\n\nAbstract\n\nA spatial point process can be characterized by an intensity function which predicts\nthe number of events that occur across space. In this paper, we develop a method to\ninfer predictive intensity intervals by learning a spatial model using a regularized\ncriterion. We prove that the proposed method exhibits out-of-sample prediction\nperformance guarantees which, unlike standard estimators, are valid even when the\nspatial model is misspeci\ufb01ed. The method is demonstrated using synthetic as well\nas real spatial data.\n\n1\n\nIntroduction\n\nSpatial point processes can be found in a range of applications from astronomy and biology to ecology\nand criminology. These processes can be characterized by a nonnegative intensity function \u03bbpxq\nwhich predicts the number of events that occur across space parameterized by x P X [8, 4].\nA standard approach to estimate the intensity function of a process is to use nonparametric kernel\ndensity-based methods [6, 7]. These smoothing techniques require, however, careful tuning of kernel\nbandwidth parameters and are, more importantly, subject to selection biases. That is, in regions\nwhere no events have been observed, the intensity is inferred to be zero and no measure is readily\navailable for a user to assess the uncertainty of such predictions. More advanced methods infer the\nintensity by assuming a parameterized model of the data-generating process, such as inhomogeneous\nPoisson point process models. One popular model is the log-Gaussian Cox process (LGCP) model [9]\nwhere the intensity function is modeled as a Gaussian process [22] via a logarithmic link function\nto ensure non-negativity. However, the in\ufb01nite dimensionality of the intensity function makes this\nmodel computationally prohibitive and substantial effort has been devoted to develop more tractable\napproximation methods based on gridding [9, 13], variational inference [15, 12], Markov chain\nMonte Carlo [2] and Laplace approximations [20] for the log and other link functions. A more\nfundamental problem remains in that their resulting uncertainty measures are not calibrated to the\nactual out-of-sample variability of the number of events across space. Poor calibration consequently\nleads to unreliable inferences of the process.\nIn this paper, we develop a spatially varying intensity interval with provable out-of-sample perfor-\nmance guarantees. Our contributions can be summarized as follows:\n\n\u2022 the interval reliably covers out-of-sample events with a speci\ufb01ed probability by building on\n\nthe conformal prediction framework [19],\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f\u2022 it is constructed using a predictive spatial Poisson model with provable out-of-sample\n\naccuracy,\n\n\u2022 its size appropriately increases in regions with missing data to re\ufb02ect inherent uncertainty\n\nand mitigate sampling biases,\n\n\u2022 the statistical guarantees remain valid even when the assumed Poisson model is misspeci\ufb01ed.\n\nThus the proposed method yields both reliable and informative predictive intervals under a wider\nrange of conditions than standard methods which depend on the assumed model, e.g. LGCP [9], to\nmatch the unknown data-generating process.\nNotations: Enras \u201c n\u00b41\nproduct is denoted d.\n\ni\u201c1 ai denotes the sample mean of a. The element-wise Hadamard\n\n\u0159\n\nn\n\n2 Problem formulation\n\nFigure 1: Unknown intensity function \u03bbpxq (solid) expressed in number of counts per unit of area,\nacross a one-dimensional spatial domain X \u201c r0, 200s which is discretized into 50 regions. Intensity\ninterval \u039b\u03b1pxq with 1 \u00b4 \u03b1 \u201c 80% out-of-sample coverage (3) inferred using n \u201c 50 samples.\n\nEstimated intensity functionp\u03bbpxq (dashed). Data is missing in the regions r30, 80s and r160, 200s\n\nwhere the intensity interval increases appropriately.\n\u0164\nThe intensity function \u03bbpxq of a spatial process is expressed as the number of events per unit area\nand varies over a spatial domain of interest, X , which we equipartition into R disjoint regions:\nr\u201c1 Xr \u0102 Rd and is a common means of modelling continuous inhomogeneous point\nX \u201c\nprocesses, see [9, 13]. The function \u03bbpxq determines the expected number of events y P t0, . . . , Y u\nthat occur in region Xr by\n\nR\n\n\u017c\n\nXr\n\nEry|rs \u201c\n\n\u03bbpxqdx,\n\n(1)\n\n(2)\n\nwhere r is the region index and Y is the maximum number of counts.\nWe observe n independent samples drawn from the process,\npri, yiq \u201e pprqppy|rq,\n\nwhere the data-generating distribution is unknown. Let the collection of pairwise datapoints be\ndenoted pr, yq \u201c tpr1, y1q, . . . ,prn, ynqu. Given this dataset, our goal is to infer an intensity interval\n\u039bpxq \u0102 r0, 8q of the unknown spatial point process, which predicts the number of events per unit\narea at location x. See Figure 1 for an illustration in one-dimensional space. A reliable interval\n\u039b\u03b1pxq will cover a new out-of-sample observation y in a region r with a probability of at least 1 \u00b4 \u03b1.\nThat is, for a speci\ufb01ed level \u03b1 the out-of-sample coverage is\ny P \u039b\u03b1pxq|Xr|, @x P Xr\n\n(3)\nwhere |Xr| is the area of the rth region. Since the trivial noninformative interval r0, 8q also satis\ufb01es\n(3), our goal is to construct \u039b\u03b1pxq that is both reliable and informative.\n\n\u011b 1 \u00b4 \u03b1,\n\n!\n\n)\n\nPr\n\n2\n\n\fInference method\n\n3\nWe begin by showing that an intensity interval \u039b\u03b1pxq with reliable out-of-sample coverage can\nbe constructed using the conformal prediction framework [19]. Note that obtaining tractable and\ninformative intervals in this approach requires learning an accurate predictor in a computationally\nef\ufb01cient manner. We develop such a predictor and prove that it has \ufb01nite-sample and distribution-\nfree performance guarantees. These guarantees are independent of the manner in which space is\ndiscretized.\n\n3.1 Conformal intensity intervals\nLet E\u03b8ry|rs denote a predictor parameterized by a vector \u03b8. For a given region r, consider a new data\nconformal prediction is to quantify how well this new point conforms to the observed data pr, yq.\n\npoint pr,ryq, wherery represents number of counts and takes a value between r0, Y s. The principle of\nThis is done by \ufb01rst \ufb01tting parameters \u03b81 to the augmented set pr, yqYpr,ryq and then using the score\n\u02c7\u02c7\u00af\n\u02c7\u02c7ry \u00b4 E\u03b81ry|rs\n\u02c7\u02c7 is statistically indistinguishable from\n\u02c7\u02c7ry \u00b4 E\u03b81ry|rs\nthe rest, \u03c0pryq corresponds to a p-value [19]. On this basis we construct an intensity interval \u039b\u03b1pxq\nby including all pointsry that conform to the dataset with signi\ufb01cance level \u03b1, as summarized in\n\n(4)\nwhere Ip\u00a8q is the indicator function and ei \u201c |yi \u00b4 E\u03b81ry|ris| are residuals for all observed data\npoints i \u201c 1, . . . , n. When a new residual\n\nAlgorithm 1. Using [14, thm. 2.1], we can prove that \u039b\u03b1pxq satis\ufb01es the out-of-sample coverage (3).\n\n\u03c0pryq \u201c 1\n\n\u00b4\nei \u010f\n\nI\n\nn`1\u00ff\n\ni\u201c1\n\nP p0, 1s,\n\nn ` 1\n\nAlgorithm 1 Conformal intensity interval\n1: Input: Location x, signi\ufb01cance level \u03b1, data pr, yq\n3: Set r if x P Xr\n\n2: for allry P t0, . . . , Y u do\n4: Update predictor E\u03b8ry|rs using augmented data pr, yq Y pr,ryq\n5: Compute score \u03c0pryq in (4)\n7: Output: \u039b\u03b1pxq \u201c try : pn ` 1q\u03c0pryq \u010f rpn ` 1q\u03b1su{|Xr|\n\n6: end for\n\nWhile this approach yields reliable out-of-sample coverage guarantees, there are two possible limita-\ntions:\n\n1. The residuals can be decomposed as e \u201c pEry|rs \u00b4 E\u03b8ry|rsq ` \u03b5, where the term in\nbrackets is the model approximation error and \u03b5 is an irreducible zero-mean error. Obtaining\ninformative \u039b\u03b1pxq across space requires learned predictors with small model approximation\nerrors.\n2. Learning methods that are computationally demanding render the computation of \u039b\u03b1pxq\nintractable across space, since the conformal method requires re-\ufb01tting the predictor multiple\ntimes for each region.\n\nNext, we focus on addressing both limitations.\n\n3.2 Spatial model\nWe seek an accurate model p\u03b8py|rq of ppy|rq, parameterized by \u03b8. For a given r, we quantify the\nout-of-sample accuracy of a model by the Kullback-Leibler divergence per sample,\n\n\u201e\n\n\uf6be\n\nRp\u03b8q \u201c 1\nn\n\nEy|r\n\nln\n\nppy|rq\np\u03b8py|rq\n\n\u011b 0,\n\nfor which Rp\u03b8q \u201c 0 \u00f4 p\u03b8py|rq \u201d ppy|rq\n\n(5)\n\nIn general, the unknown intensity function underlying ppy|rq has a local spatial structure and can\nbe modeled as smooth since we expect counts in neighbouring regions to be similar in real-world\n\n3\n\n\fapplications. On this basis, we consider following the class of models,\n\n!\np\u03b8py|rq is Poisson with mean E\u03b8ry|rs \u201c expp\u03c6Jprq\u03b8q, \u03b8 P RR\n\n)\n\n,\n\nP\u03b8 \u201c\n\nwhere \u03c6prq is R \u02c6 1 spatial basis vector whose components are given by the cubic b-spline function\n[21] (see supplementary material). The Poisson distribution is the maximum entropy distribution\nfor count data and is here parameterized via a latent \ufb01eld t\u03b81, . . . , \u03b8Ru across regions [4, ch. 4.3].\nUsing a cubic b-spline basis [21], we model the mean in region r via a weighted average \u03c6prqJ\u03b8\nof latent parameters from neighbouring regions, where the maximum weight in \u03c6prq is less than 1.\nThis parameterization yields locally smooth spatial structures and is similar to using a latent process\nmodel for the mean as in the commonly used LGCP model [9, sec. 4.1].\nThe unknown optimal predictive Poisson model is given by\nRp\u03b8q\n\n\u03b8\u2039 \u201c arg min\n\n(6)\n\nand has an out-of-sample accuracy Rp\u03b8\u2039q.\n\n\u03b8\n\np\u03b8 \u201c arg min\n\n3.3 Regularized learning criterion\nWe propose learning a spatial Poisson model in P\u03b8 using the following learning criterion\n\n\u00b4 n\u00b41 ln p\u03b8py|rq ` n\u00b4\u03b3||w d \u03b8||1,\n\n(7)\nwhere ln p\u03b8py|rq is the log-likelihood, which is convex [18], and w is a given vector of regularization\nweights. The regularization term in (7) not only mitigates over\ufb01tting of the model by penalizing\nparameters in \u03b8 individually, it also yields the following \ufb01nite sample and distribution-free result.\nTheorem 1 Let \u03b3 P p0, 1\n2q, then the out-of-sample accuracy of the learned model is bounded as\n\n\u03b8\n\n\u00b4\n\nwith a probability of at least\n\nmax\n\n0, 1 \u00b4 2R exp\n\nRpp\u03b8q \u010f Rp\u03b8\u2039q ` 2n\u00b4\u03b3||w d \u03b8\u2039||1\n!\n\n)\u00af\n, where wo \u201c min\nk\u201c1,...,R\n\n\u00b4 w2\n\non1\u00b42\u03b3\n2Y 2\n\nwk.\n\n(8)\n\nMaterial. The above theorem guarantees that the out-of-sample accuracy Rpp\u03b8q of the learned model\n\nWe provide an outline of the proof in Section 3.3.1, while relegating the details to the Supplementary\n(7) will be close to Rp\u03b8\u2039q of the optimal model (6), even if the model class (3.2) does not contain the\ntrue data-generating process. As \u03b3 is increased, the bound tightens and the probabilistic guarantee\nweakens, but for a given data set one can readily search for the value of \u03b3 P p0, 0.5q which yields the\nmost informative interval \u039b\u03b1pxq.\nThe \ufb01rst term of (7) contains inner products \u03c6Jprq\u03b8 which are formed using a regressor matrix.\nTo balance \ufb01tting with the regularizing term in (7), it is common to rescale all columns of the\nregressor matrix to unit norm. An equivalent way is to choose the following regularization weights\nwk \u201c\n\na\nEnr|\u03c6kprq|2s, see e.g. [3]. We then obtain a predictor as\n\nand predictive intensity interval \u039b\u03b1pxq via Algorithm 1. Setting wk \u201d 0 in (7) yields a maximum\nlikelihood model with less informative intervals, as we show in the numerical experiments section.\n\nEp\u03b8ry|rs \u201c expp\u03c6Jprqp\u03b8q\nThe minimizerp\u03b8 in (7) satis\ufb01espRpp\u03b8q \u010f pRp\u03b8\u2039q ` \u03c1fp\u03b8\u2039q \u00b4 \u03c1fpp\u03b8q,\nwhere pRp\u03b8q \u201c n\u00b41 ln ppy|rq\n\n3.3.1 Proof of theorem\n\nand \u03c1 \u201c n\u00b4\u03b3.\n\n(9)\np\u03b8py|rq is the in-sample divergence, corresponding to (5), fp\u03b8q \u201c ||w d \u03b8||1\n\n4\n\n\f\u0159\n\n\u03b8 ` 1\nn\n\nUsing the functional form of the Poisson distribution, we have\n\n\u00b4 ln p\u03b8pyi|riq \u201c n\u00ff\n\n\u00b4 ln p\u03b8py|rq \u201c n\u00ff\n\u201d\nRp\u03b8q \u00b4 pRp\u03b8q \u201c 1\n\u201d\nln p\u03b8py|rq \u00b4 Ey|rrln p\u03b8py|rqs ` Ey|rrln ppy|rqs \u00b4 ln ppy|rq\npy \u00b4 Ey|rrysq\u03c6prq\n\nE\u03b8ryi|ris \u00b4 yi lnpE\u03b8ryi|risq ` lnpyi!q\n\u0131\n\nThen the gap between the out-of-sample and in-sample divergences for any given model \u03b8 is given\nby\n\nn\n\u201c En\n\n\u0131J\n\n(10)\n\ni\u201c1\n\ni\u201c1\n\nK,\n\n\u00b4\n\n\u2030\n\nn\ni\u201c1\n\nwhere\n\nand we can therefore relate the gaps for the optimal modelp\u03b8 with the learned model \u03b8\u2039 as follows:\n\nwhere the second line follows from using our Poisson model P\u03b8 and K \u201c Ey|rrln ppy|rqs \u00b4\nEy|rrlnpyi!qs \u00b4 lnpyi!q is a constant. Note that the divergence gap is linear in \u03b8,\nln ppy|rq `\n\u201c\n\u201c\n\u2030\nRpp\u03b8q \u00b4 pRpp\u03b8q\nRp\u03b8\u2039q \u00b4 pRp\u03b8\u2039q\n\u201d\n\u02c7\u02c7\ng \u201d B\u03b8rRp\u03b8q \u00b4 pRp\u03b8qs\n\u03b8\u201cp\u03b8 \u201c\nEnrz1s, . . . , EnrzRs\n\n\u201c gJp\u03b8\u2039 \u00b4p\u03b8q,\n\u0131J\n\nis the gradient of (10) and we introduce the random variable zk \u201c py \u00b4 Ey|rrysq\u03c6kprq P r\u00b4Y, Y s\nfor notational simplicity (see supplementary material).\nInserting (9) into (11) and re-arranging yields\n\nRpp\u03b8q \u010f Rp\u03b8\u2039q \u00b4 gJp\u03b8\u2039 \u00b4p\u03b8q ` \u03c1fp\u03b8\u2039q \u00b4 \u03c1fpp\u03b8q,\n\nwhere the RHS is dependent onp\u03b8. Next, we upper bound the RHS by a constant that is independent\nofp\u03b8.\n\n(11)\n\n(12)\n\n,\n\nThe weighted norm fp\u03b8q has an associated dual norm\ngJ\u03b8 \u201d ||g||8\n\nrfpgq \u201c sup\n\u00b4gJ\u03b8\u2039 \u010f rfpgqfp\u03b8\u2039q\n\n\u03b8:fp\u03b8q\u010f1\n\n\u201c max\nk\u201c1,...,R\n\n|Enrzks|\n\nwo\n\nwo\n\nand gJp\u03b8 \u010f rfpgqfpp\u03b8q\n\nsee the supplementary material. Using the dual norm, we have the following inequalities\n\nand combining them with (12), as in [23], yields\n\nRpp\u03b8q \u010f Rp\u03b8\u2039q ` p\u03c1 ` rfpgqqfp\u03b8\u2039q ` prfpgq \u00b4 \u03c1qfpp\u03b8q \u010f Rp\u03b8\u2039q ` 2\u03c1fp\u03b8\u2039q\n\nwhen \u03c1 \u011b rfpgq. The probability of this event is lower bounded by\n\n`\n\n\u02d8\n\u03c1 \u011b rfpgq\n\nPr\n\n\u011b 1 \u00b4 2R exp\n\nWe derive this bound using Hoeffding\u2019s inequality, for which\n\n\u201c\n\n\u00b4\nand Erzks \u201c Er\n\nPrp|Enrzks \u00b4 Erzks| \u010f \u0001q \u011b 1 \u00b4 2 exp\n\u00af\n\u201c 0. Moreover,\n\n\u2030\n\u00b4 R\u010d\npEy|rrys \u00b4 Ey|rrysq\u03c6kprq\n\u201c Pr\n\n|Enrzks| \u010f \u0001\n\n|Enrzks| \u010f \u0001\n\n\u00af\n\nPr\n\nmax\n\nk\u201c1,...,R\n\nk\u201c1\n\n\u201d\n\n\u0131\n\n,\n\n\u00b4 n\u00012\n2Y 2\n\n\u011b 1 \u00b4 2R exp\n\nusing DeMorgan\u2019s law and the union bound (see supplementary material). Setting \u0001 \u201c wo\u03c1, we\nobtain (14) Hence equation (13) and (14) prove Theorem 1. It can be seen that for \u03b3 P p0, 1\n2q, the\nprobability bound on the right hand side increases with n.\n\n\u201d\n\n\u0131\n\non1\u00b42\u03b3\n\u00b4 w2\n2Y 2\n\u201d\n\n\u00b4 n\u00012\n2Y 2\n\n\u0131\n\n,\n\n(13)\n\n(14)\n\n(15)\n\n5\n\n\f3.3.2 Minimization algorithm\n\n(16)\n\nV p\u03b8q ` n\u00b4\u03b3fp\u03b8q \u010f Qp\u03b8;r\u03b8q ` n\u00b4\u03b3fp\u03b8q,\n\nTo solve the convex minimization problem (7) in a computationally ef\ufb01cient manner, we use\na majorization-minimization (MM) algorithm. Speci\ufb01cally, let V p\u03b8q \u201c \u00b4n\u00b41 ln p\u03b8py|rq and\nfp\u03b8q \u201c ||w d \u03b8||1 then we bound the objective in (7) as\n\nwhere Qp\u03b8;r\u03b8q is a quadratic majorizing function of V p\u03b8q such that Qpr\u03b8;r\u03b8q \u201c V pr\u03b8q, see [18, ch. 5].\nwhen updating the predictor E\u03b8ry|rs with an augmented dataset pr, yq Y pr,ryq. This renders the\n\nMinimizing the right hand side of (16) takes the form of a weighted lasso regression and can therefore\nbe solved ef\ufb01ciently using coordinate descent. The pseudo-code is given in Algorithm 2, see the\nsupplementary material for details. The runtime of Algorithm 2 scales as OpnR2q i.e. linear in\nnumber of datapoints n. This computational ef\ufb01ciency of Algorithm 2 is leveraged in Algorithm 1\ncomputation of \u039b\u03b1pxq tractable across space.\nAlgorithm 2 Majorization-minimization method\n1: Input: Data pr, yq, parameter \u03b3 P p0, 1\n2: Form weights wk \u201c\n\na\nQp\u03b8;r\u03b8q ` n\u00b4\u03b3||w d \u03b8||1 using coordinate descent\n\n3: Setr\u03b8 :\u201c 0\n5: Form quadratic approximation atr\u03b8: Qp\u03b8;r\u03b8q ` n\u00b4\u03b3||w d \u03b8||1\n6: Solveq\u03b8 :\u201c arg min\n7: r\u03b8 :\u201cq\u03b8\n8: until ||p\u03b8 \u00b4q\u03b8|| \u011b \u0001\n9: Output:p\u03b8 \u201cq\u03b8 and Ep\u03b8ry|rs \u201c expp\u03c6Jprqp\u03b8q\n\nEnr|\u03c6kprq|2s for k \u201c 1, . . . , R\n\n2q and Y\n\n4: while\n\n\u03b8\n\nThe code for algorithms 1 and 2 are provided on github.\n\n4 Numerical experiments\n\nWe demonstrate the proposed method using both synthetic and real spatial data.\n\n4.1 Synthetic data with missing regions\n\n\u00af\n\n\u00b4\n\n\u03bbpxq \u201c 10 exp\n\nTo illustrate the performance of our learning criterion in (7), we begin by considering a one-\ndimensional spatial domain X \u201c r0, 100s, equipartitioned into R \u201c 20 regions. Throughout we\nuse \u03b3 \u201c 0.499 in (7).\nComparison with log-Gaussian Cox process model\nWe consider a process described by the intensity function\n\u00b4 x\n50\n\nand sample events using a spatial Poisson process model using inversion sampling [5]. The distribution\nppy|rq is then Poisson. Using a realization pr, yq, we compare our predictive intensity interval \u039b\u03b1pxq\n\nwith a p1 \u00b4 \u03b1q%-credibility intervalr\u039b\u03b1pxq obtained by assuming an LGCP model for the \u03bbpxq [9]\nr70, 100s, respectively. Figures 2a and 2b show the intervals both cases. Whiler\u039b\u03b1pxq is tighter than\naway from the observed data regions. By contrast,r\u039b\u03b1pxq provides misleading inferences in this case.\n\nand approximating its posterior belief distribution using integrated nested Laplace approximation\n(INLA) [17, 11]. For the cubic b-splines in P\u03b8, the spatial support of the weights in \u03c6prq was set to\ncover all regions.\nWe consider interpolation and extrapolation cases where the data is missing across r30, 80s and\n\u039b\u03b1pxq in the missing data regions, it has no out-of-sample guarantees and therefore lacks reliability.\nThis is critically evident in the extrapolation case, where \u039b\u03b1pxq becomes noninformative further\n\n(17)\n\n,\n\n6\n\n\f(b) Extrapolation with data miss-\ning in r70, 100s\n\n(c) Average interval size with data\nmissing in r50, 90s\n\n(a) Interpolation with data miss-\ning in r30, 80s\n\nFigure 2: (a) Interpolation and (b) extrapolation with \u039b\u03b1pxq (grey) andr\u039b\u03b1pxq (green) with 1 \u00b4 \u03b1 \u201c\n\n80%, for a given realization of point data (black crosses). The unknown intensity function \u03bbpxq\n(red) gives the expected number of events in a region, see (1). (c) Misspeci\ufb01ed case with average\nintensity interval size |\u039b\u03b1pxq|, using nonzero (blue) and zero (red) regularization weights in (7). Data\nin r50, 90s is missing. The different markers correspond to three different spatial processes, with\nintensity functions \u03bb1pxq, \u03bb2pxq and \u03bb3pxq. The out-of-sample coverage (3) was set to be at least\n1 \u00b4 \u03b1 \u201c 80% and the empirical coverage is given in 1.\n\nEmpirical coverage of \u039b\u03b1pxq [%]\n\nProposed Unregularized\n\n\u03b1 \u201c 0.2\n\n\u03bb1\n\u03bb2\n\u03bb3\n\n97.05\n91.05\n81.37\n\n97.37\n98.32\n95.37\n\nTable 1: Comparison of empirical coverage of \u039b\u03b1pxq, using the proposed regularized vs.\nunregularized maximum likelihood method. We target \u011b 1 \u00b4 \u03b1 \u201c 80% coverage.\n\nthe\n\nComparison with unregularized maximum likelihood model\nNext, we consider a three different spatial processes, described by intensity functions\n\n\u201c\n\n\u2030\n\n\u03bb1pxq \u201c\n\n500?\n2\u03c0252\n\nexp\n\n\u00b4 px \u00b4 50q2\n2 \u02c6 252\n\n, \u03bb2pxq \u201c 5 sinp 2\u03c0\n50\n\nxq ` 5, \u03bb3pxq \u201c 3\n8\n\n?\n\nx.\n\nFor the \ufb01rst process, the intensity peaks at x \u201c 50, the second process is periodic with a period of 50\nspatial units, and for the third process the intensity grows monotonically with space x. In all three\ncases, the number of events in a given region is then drawn as y \u201e ppy|rq using a negative binomial\ndistribution, with mean given by (1) and number of failures set to 100, yielding a dataset pr, yq. Note\nthat the Poisson model class P\u03b8 is misspeci\ufb01ed here.\nWe set the nominal out-of-sample coverage \u011b 80% and compare the interval sizes |\u039b\u03b1pxq| across\nspace and the overall empirical coverage, when using regularized and unregularized criteria (7),\nrespectively. The averages are formed using 50 Monte Carlo simulations.\nFigure 2c and Table 1 summarize the results of comparison between the regularized and unregularized\napproaches for the three spatial processes. While both intervals exhibit the same out-of-sample\ncoverage (table 1), the unregularized method results in intervals that are nearly four times larger than\nthose of the proposed method (\ufb01gure 2c) in the missing region.\n\n4.2 Real data\n\nIn this section we demonstrate the proposed method using two real spatial data sets.\nIn two-\ndimensional space it is challenging to illustrate a varying interval \u039b\u03b1pxq, so for clarity we show its\nmaximum value, minimium value and size as well as compare it with a point estimate obtained from\nthe predictor, i.e.,\n\np\u03bbpxq \u201c R\u00ff\n\nr\u201c1\n\nIpx P XrqEp\u03b8ry|rs\n|Xr|\n\n(18)\n\nThroughout we use \u03b3 \u201c 0.4 in (7).\n\n7\n\n02040608010001020304050\f(a)p\u03bbpxq\n\nFigure 3: # trees per m2. Nominal coverage set to 1 \u00b4 \u03b1 \u201c 80%. The dashed boxes mark missing\ndata regions.\n\n(b) max \u039b\u03b1pxq\n\n(c) min \u039b\u03b1pxq\n\n(a) |\u039b\u03b1pxq|\n\n(b) |r\u039b\u03b1pxq|\n\nFigure 4: # trees per m2. Comparison between proposed intensity interval and credibility intensity\ninterval from approximate posterior of LGCP model.\n\nHickory tree data\nFirst, we consider the hickory trees data set [1] which consists of coordinates of hickory trees in a\nspatial domain X \u0102 R2, shown in Figure 3a, that is equipartitioned into a regular lattice of R \u201c 52\nhexagonal regions. The dataset pr, yq contains the observed number of trees in each region. The\ndashed boxes indicate regions data inside which is considered to be missing. For the cubic b-splines\nin P\u03b8, the spatial support was again set to cover all regions.\n\nWe observe that the point predictorp\u03bbpxq interpolates and extrapolates smoothly across regions and\np1\u00b4 \u03b1q% credibility interval |r\u039b\u03b1pxq| using the LGCP model as above, cf. Figures 4a and 4b. We note\n|r\u039b\u03b1pxq| is virtually unchanged in contrast to |\u039b\u03b1pxq|. While |r\u039b\u03b1pxq| appears relatively tighter than\n\nappears to visually conform to the density of the point data. Figures 3b and 3c provide important\ncomplementary information using \u039b\u03b1pxq, whose upper limit increases in the missing data regions,\nespecially when extrapolating in the bottom-right corner, and lower limit rises in the dense regions.\nThe size of the interval |\u039b\u03b1pxq| quanti\ufb01es the predictive uncertainty and we compare it to the\n\nthat the sizes increase in different ways for the missing data regions. For the top missing data region,\n|\u039b\u03b1pxq| across the bottom-right missing data regions, the credible interval lacks any out-of-sample\nguarantees that would make the prediction reliable.\nCrime data\nNext we consider crime data in Portland police districts [16, 10] which consists of locations of\ncalls-of-service received by Portland Police between January and March 2017 (see \ufb01gure 5a). The\nspatial region X \u0102 R2 is equipartitioned into a regular lattice of R \u201c 494 hexagonal regions.\nThe dataset pr, yq contains the reported number of crimes in each region. The support of the cubic\nb-spline is taken to be 12 km.\n\nThe point predictionp\u03bbpxq is shown in Figure 5a, while Figures 5b and 5c plot the upper and lower\nlimits of \u039b\u03b1pxq, respectively. We observe thatp\u03bbpxq follows the density of the point pattern well,\n\npredicting a high intensity of approximately 60 crimes/km2 in the center. Moreover, upper and lower\nlimits of \u039b\u03b1pxq are both high where point data is dense. The interval tends to being noninformative\nfor regions far away from those with observed data, as is visible in the top-left corner when comparing\nFigures 5b and 5c.\n\n8\n\n010020005010015020025000.0050.010.0150.02010020005010015020025000.0050.010.0150.020.0250.03010020005010015020025000.0050.010.0150.020.0250.03010020005010015020025000.0050.010.0150.020.0250.03010020005010015020025000.0050.010.0150.020.0250.03\f(a)p\u03bbpxq\n\n(b) max \u039b\u03b1pxq\n\n(c) min \u039b\u03b1pxq\n\nFigure 5: # crimes per km2 in Portland, USA. Nominal coverage set to 1 \u00b4 \u03b1 \u201c 80%.\n\n5 Conclusion\n\nWe have proposed a method for inferring predictive intensity intervals for spatial point processes. The\nmethod utilizes a spatial Poisson model with an out-of-sample accuracy guarantee and the resulting\ninterval has an out-of-sample coverage guarantee. Both properties hold even when the model is\nmisspeci\ufb01ed. The intensity intervals provide a reliable and informative measure of uncertainty of the\npoint process. Its size is small in regions with observed data and grows along missing regions further\naway from data. The proposed regularized learning criterion also leads to more informative intervals\nas compared to an unregularized maximum likelihood approach, while its statistical guarantees\nrenders it reliable in a wider range of conditions than standard methods such as LGCP inference. The\nmethod was demonstrated using both real and synthetic data.\n\nAcknowledgments\nThe work was supported by the Swedish Research Council (contract numbers 2017 \u00b4 04610 and\n2018 \u00b4 05040).\n\nReferences\n[1] P. J. Diggle @ lancaster university. https://www.lancaster.ac.uk/staff/diggle/\n\npointpatternbook/datasets/.\n\n[2] R. P. Adams, I. Murray, and D. J. MacKay. Tractable nonparametric bayesian inference\nin poisson processes with gaussian process intensities. In Proceedings of the 26th Annual\nInternational Conference on Machine Learning, pages 9\u201316. ACM, 2009.\n\n[3] A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse signals\n\nvia conic programming. Biometrika, 98(4):791\u2013806, 2011.\n\n[4] N. Cressie and C. K. Wikle. Statistics for spatio-temporal data. John Wiley & Sons, 2015.\n\n[5] L. Devroye. Sample-based non-uniform random variate generation. In Proceedings of the 18th\n\nconference on Winter simulation, pages 260\u2013265. ACM, 1986.\n\n[6] P. J. Diggle. A kernel method for smoothing point process data. Journal of the Royal Statistical\n\nSociety: Series C (Applied Statistics), 34(2):138\u2013147, 1985.\n\n[7] P. J. Diggle. Statistics analysis of spatial point patterns. Hodder Education Publishers, 2003.\n\n[8] P. J. Diggle. Statistical analysis of spatial and spatio-temporal point patterns. Chapman and\n\nHall/CRC, 2013.\n\n[9] P. J. Diggle, P. Moraga, B. Rowlingson, B. M. Taylor, et al. Spatial and spatio-temporal log-\ngaussian cox processes: extending the geostatistical paradigm. Statistical Science, 28(4):542\u2013\n563, 2013.\n\n[10] S. Flaxman, M. Chirico, P. Pereira, and C. Loef\ufb02er. Scalable high-resolution forecasting of\nsparse spatiotemporal events with kernel methods: a winning solution to the nij\" real-time crime\nforecasting challenge\". arXiv preprint arXiv:1801.02858, 2018.\n\n9\n\n76357640764576506806816826836846850010203040506076357640764576506806816826836846850204060801001207635764076457650680681682683684685020406080100120\f[11] J. B. Illian, S. H. S\u00f8rbye, and H. Rue. A toolbox for \ufb01tting complex spatial point process models\nusing integrated nested laplace approximation (inla). The Annals of Applied Statistics, pages\n1499\u20131530, 2012.\n\n[12] S. John and J. Hensman. Large-scale cox process inference using variational fourier features.\n\n2018.\n\n[13] O. O. Johnson, P. J. Diggle, and E. Giorgi. A spatially discrete approximation to log-gaussian\ncox processes for modelling aggregated disease count data. arXiv preprint arXiv:1901.09551,\n2019.\n\n[14] J. Lei, M. G\u2019Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman. Distribution-free predictive\ninference for regression. Journal of the American Statistical Association, 113(523):1094\u20131111,\n2018.\n\n[15] C. Lloyd, T. Gunter, M. Osborne, and S. Roberts. Variational inference for gaussian process\nmodulated poisson processes. In International Conference on Machine Learning, pages 1814\u2013\n1822, 2015.\n\n[16] National Institute of Justice. Real-time crime forecasting challenge posting. https://nij.\n\ngov/funding/Pages/fy16-crime-forecasting-challenge-document.aspx#data.\n\n[17] H. Rue, S. Martino, and N. Chopin. Approximate bayesian inference for latent gaussian models\nby using integrated nested laplace approximations. Journal of the royal statistical society:\nSeries b (statistical methodology), 71(2):319\u2013392, 2009.\n\n[18] R. Tibshirani, M. Wainwright, and T. Hastie. Statistical learning with sparsity: the lasso and\n\ngeneralizations. Chapman and Hall/CRC, 2015.\n\n[19] V. Vovk, A. Gammerman, and G. Shafer. Algorithmic learning in a random world. Springer\n\nScience & Business Media, 2005.\n\n[20] C. J. Walder and A. N. Bishop. Fast bayesian intensity estimation for the permanental process.\nIn Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages\n3579\u20133588. JMLR. org, 2017.\n\n[21] L. Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006.\n\n[22] C. K. Williams and C. E. Rasmussen. Gaussian processes for machine learning, volume 2. MIT\n\nPress Cambridge, MA, 2006.\n\n[23] R. Zhuang and J. Lederer. Maximum regularized likelihood estimators: A general prediction\n\ntheory and applications. Stat, 7(1):e186, 2018.\n\n10\n\n\f", "award": [], "sourceid": 6422, "authors": [{"given_name": "Muhammad", "family_name": "Osama", "institution": "Uppsala University"}, {"given_name": "Dave", "family_name": "Zachariah", "institution": "Uppsala University"}, {"given_name": "Peter", "family_name": "Stoica", "institution": "Uppsala University"}]}