{"title": "Inferring Latent Velocities from Weather Radar Data using Gaussian Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 8984, "page_last": 8993, "abstract": "Archived data from the US network of weather radars hold detailed information about bird migration over the last 25 years, including very high-resolution partial measurements of velocity. Historically, most of this spatial resolution is discarded and velocities are summarized at a very small number of locations due to modeling and algorithmic limitations. This paper presents a Gaussian process (GP) model to reconstruct high-resolution full velocity fields across the entire US. The GP faithfully models all aspects of the problem in a single joint framework, including spatially random velocities, partial velocity measurements, station-specific geometries, measurement noise, and an ambiguity known as aliasing. We develop fast inference algorithms based on the FFT; to do so, we employ a creative use of Laplace's method to sidestep the fact that the kernel of the joint process is non-stationary.", "full_text": "Inferring Latent Velocities from Weather Radar Data\n\nusing Gaussian Processes\n\nRico Angell\n\nUniversity of Massachusetts Amherst\n\nrangell@cs.umass.edu\n\nDaniel Sheldon\n\nUniversity of Massachusetts Amherst\n\nsheldon@cs.umass.edu\n\nAbstract\n\nArchived data from the US network of weather radars hold detailed information\nabout bird migration over the last 25 years, including very high-resolution partial\nmeasurements of velocity. Historically, most of this spatial resolution is discarded\nand velocities are summarized at a very small number of locations due to modeling\nand algorithmic limitations. This paper presents a Gaussian process (GP) model\nto reconstruct high-resolution full velocity \ufb01elds across the entire US. The GP\nfaithfully models all aspects of the problem in a single joint framework, includ-\ning spatially random velocities, partial velocity measurements, station-speci\ufb01c\ngeometries, measurement noise, and an ambiguity known as aliasing. We develop\nfast inference algorithms based on the FFT; to do so, we employ a creative use\nof Laplace\u2019s method to sidestep the fact that the kernel of the joint process is\nnon-stationary.\n\n1\n\nIntroduction\n\nArchived data from the US network of weather radars hold valuable information about atmospheric\nphenomona across the US for over 25 years [1]. Although these radars were designed to monitor\nweather, they also detect \ufb02ying animals such as birds, bats, and insects [2]. The information\ncontained in the archive is critical to understanding phenomena ranging from extreme weather to bird\nmigration [3\u20135].\nThis paper is concerned with using radar to measure velocity, with the primary goal of gathering\ndetailed information about bird migration. Radar is the most comprehensive source of information\nabout this dif\ufb01cult-to-study phenomenon [5\u20138], but, historically, most information has gone largely\nunused due to the sheer size of the data and the dif\ufb01culty of interpreting it automatically. Recently,\nanalytical advances including machine learning [9, 10] are enabling scientists to begin to conduct\nlarger scale studies [5, 7, 11]. Radar measurements of bird migration density, direction, and speed\nare important for understanding the biology of bird migration and to guide conservation [11\u201315].\nMachine learning methods to automate the detailed interpretation of radar data will allow scientists to\nanswer questions at the scale of the entire continent and over more than two decades.\nDoppler radars measure the rate at which objects approach or depart the radar, which gives partial\ninformation about their velocity. By making certain smoothness assumptions, it is possible to\nreconstruct full velocity vectors [9, 16]. However, current methods are limited by rigid smoothness\nassumptions and summarize all velocity information down to 143 points across the US (the locations\nof the radar stations) even though the original data has on the order of half a billion measurements for\none nationwide snapshot.\nThe goal of this paper is to develop a comprehensive, principled, probabilistic model, together with\nfast algorithms, to reconstruct spatially detailed velocity \ufb01elds across the US. There are three critical\nchallenges. First, radars only measure radial velocity, the component of velocity in the direction of\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fthe radar beam, so the full velocity is underdetermined. Second, the measured radial velocity may be\naliased, which means it is only known up to an additive constant. Third, measurements are tied to\nstation-speci\ufb01c geometry, so it is not clear how to combine data from many stations, for example\nto \ufb01ll in gaps in coverage between stations (e.g., see Figure 1(d)). Prior research has primarily\naddressed these challenges separately, and has been unable to combine information from many radars\nto reconstruct detailed velocity \ufb01elds.\nOur \ufb01rst contribution is a joint Gaussian process (GP) to simultaneously model the radial velocity\nmeasurements from all radar stations. While it is natural to model the velocity \ufb01eld itself as a GP,\nit is not obvious how to model the collection of all station-speci\ufb01c measurements as a GP. We start\nby positing a GP on latent velocity vectors, and then derive a GP on the measurements such that the\nstation-speci\ufb01c geometry is encoded in the kernel function.\nOur second contribution is a suite of fast algorithms for inference in this GP, which allows it to\nscale to very large data sets. We leverage fast FFT-based algorithms for GP kernel operations for\npoints on a regular grid [17\u201319]. However, these require a stationary kernel, which due to the\nstation-speci\ufb01c geometry, ours is not. We show how to achieve the same speed bene\ufb01ts by using\nLaplace\u2019s method (for exact inference) so that fast kernel operations can be performed in the space\nof latent velocities, where the kernel is stationary. Finally, we show how to model aliasing directly\nwithin the GP framework by employing a wrapped normal likelihood [9, 20]; this \ufb01ts seamlessly into\nour fast approach using Laplace\u2019s method.\nThe result is a \ufb01rst-of-its-kind probabilistic model that jointly models all aspects of the data generation\nand measurement process; it accepts as input the raw radial velocity measurements, and outputs\nsmooth reconstructed velocity \ufb01elds.\n\n2 Background and Problem De\ufb01nition\n\nRadar Basics. The US network of weather radars, known as \u201cNEXRAD\u201d radars, consists of 143\nradars in the continental US. Each conducts a volume scan or scan every 6 to 10 minutes, during\nwhich is rotates its antenna 360 degrees around a \ufb01xed vertical axis (one \u201csweep\u201d) at increasing\nelevation angles. The result of one scan is a set of raster data products in three-dimensional polar\ncoordinates corresponding to this scanning strategy. One measurement corresponds to a particular\nantenna position (azimuth and elevation angle) and range; the corresponding volume of atmosphere\nat this position in the polar grid is called a sample volume.\nNEXRAD radars collect up to six different data products. For our purposes the most important\nare re\ufb02ectivity and radial velocity. Re\ufb02ectivity measures the density of objects, speci\ufb01cally, the\ntotal cross-sectional area of objects in a sample volume that re\ufb02ect radio waves back to the radar.\nRadial velocity is the rate at which objects in a sample volume approach or depart the radar, which is\nmeasured by analyzing the frequency shift of re\ufb02ected radio waves (the \u201cDoppler effect\u201d). Radial\nvelocity is illustrated in Figure 1(a). For any given sample volume, radial velocity gives only partial\nvelocity information: the projection of the actual velocity onto a unit vector in the direction of the\nradar beam. However, if the actual velocity \ufb01eld is smooth, we can often make good inferences\nabout the full velocity. Figure 1(b) shows example radial velocity information measured from the\nKBGM radar in Binghamton, NY on the night of September 11, 2010, during which there was heavy\nbird migration. Objects approaching the radar have negative radial velocities (green), and objects\ndeparting the radar have positive radial velocities (red). We can infer from the overall pattern that\nobjects (in this case, migrating birds) are moving relatively uniformly from northeast to southwest.\nVelocity Model. To make inferences of the type in Figure 1(b) we need to simultaneously rea-\nson about spatial properties of the velocity \ufb01eld and the measurement geometry. To set up this\ntype of analysis, for the ith sample volume within the domain of one radar station, let ai be\nthe unit vector in the direction from the radar station to the sample volume. This is given by\nai = (cos \u03c6i cos \u03c1i, sin \u03c6i cos \u03c1i, sin \u03c1i) where \u03c6i and \u03c1i are the azimuth and elevation angles, re-\nspectively. Let zi = (ui, vi, wi) be the actual, unobserved, velocity vector. Then the radial velocity\ni zi + \u0001i. Here, \u0001i \u223c N (0, \u03c32) is zero-mean\nis aT\nGaussian noise that plays the dual role of modeling measurement error and deviations from whatever\nprior model is chosen for the set of all zi. For example, in the uniform velocity model [16], velocities\nare assumed to be constant-valued within \ufb01xed height bins above ground level within the domain of\n\ni zi, and the measured radial velocity is yi = aT\n\n2\n\n\fFigure 1: Illustration of key concepts: (a) schematic of radial velocity measurement, (b) radial\nvelocity in the vicinity of Binghmaton, NY radar station during bird migration event on Sep 11,\n2010, (c) aliased radial velocity, (d) a nationwide mosiac of raw radial velocity data is not easily\ninterpretable, but we can extract a velocity \ufb01eld from this inforation (arrows). See text for explanation.\n\none radar station, which is a very rigid uniformity assumption. Reported values for the noise standard\ndeviation are \u03c3 \u2208 [2, 6] ms\u22121 for birds, and \u03c3 < 2 ms\u22121 for precipitation [7].\nAliasing. Aliasing complicates the interpretation of radial velocity data. Due to the sampling\nfrequency of the radars, radial velocities can only be resolved up to the Nyquist velocity Vmax, which\ndepends on the operating mode of the radar. If the magnitude of the true radial velocity ri = aT\ni zi\nexceeds Vmax, then the measurement will be aliased. The aliasing operation is mathematically\nequivalent to the modulus operation: for any real number r, de\ufb01ne the aliased measurement of r\nto be \u00afr := r mod 2Vmax, with the convention that \u00afr lies in the interval [\u2212Vmax, Vmax] instead of\n[0, 2Vmax]. The values \u00afr + 2kVmax, k \u2208 N will all result in the same aliased measurement, and\nare called aliases. Effectively, this means that radial velocities will \u201cwrap around\u201d at \u00b1Vmax. For\nexample, Figure 1(c) shows the same data as Figure 1(b), but before aliasing errors have been\ncorrected. In this example Vmax = 11ms\u22121. We see that that fastest approaching birds in the\nnortheast quadrant appear to be departing (red), instead of approaching (dark green).\nMultiple Radar Stations. The interpretation of radial velocity is station-speci\ufb01c. Figure 1(d) shows\na nationwide mosaic of radial velocity from individual stations, overlaid by a velocity \ufb01eld. The\nmosaic is very dif\ufb01cult to interpret, due to abrupt changes at the boundaries between station coverage\nareas. Thus, although we are very accustomed to seeing nationwide composites of radar re\ufb02ectivity,\nradial velocity data is not presented or analyzed in this way. This is the main problem we seek to\nremedy in this work, by reconstructing velocity \ufb01elds of the type overlaid on Figure 1(d).\nRelated Work. The uniform velocity model [16], described above, makes a strong spatial unformity\nassumption to reconstruct velocities at different heights in the immediate vicinity of one radar station.\nVariants of this method are known as velocity volume pro\ufb01ling (VVP) or velocity-azimuthal display\n(VAD). The uniformity assumptions prevent these algorithms from reconstructing spatially varying\nvelocity \ufb01elds or combining information from multiple radars. Multi-Doppler methods combine\n\n3\n\nvelocityradar beamradialvelocitykmkm \u2212200\u22121000100200\u2212200\u22121000100200m/s\u221220\u22121001020kmkm20010001002002001000100200m/s201001020(d)(a)(b)(c)\fmeasurements from two or more radars to reconstruct full velocity vectors at points within the overlap\nof their domains [16, 21, 22]. No spatial smoothness assumptions are made. Full velocity \ufb01elds can\nbe reconstructed, but only within the overlap of radar domains. Dealiasing is the process of correcting\naliasing errors to guess the true radial velocity, usually by making smoothness assumptions or using\nsome external information [23]. Almost all previous work treats the different analytical challenges\n(reconstruction from spatial cues, multiple stations, dealiasing) separately; a few methods combine\ndealiasing with VVP or multi-Doppler methods [9, 24, 25]. Our method extends all of these methods\ninto a single, elegant, joint probabilistic model.\n\n3 Modeling Latent Velocities\n\nIn this section, we present our joint probabilistic model for radial velocity measurements and latent\nvelocities. We begin by considering the problem in the absence of aliasing, and come back to it in\nSection 4.\nLikelihood in the absence of aliasing. Let Oi be the set of stations that measure radial velocities\nat location xi. The likelihood of a single radial velocity measurement yij, in the absence of aliasing,\ngiven the latent velocity zi and the radial axis aij, is Gaussian around the perfect radial velocity\nmeasurement of the ground-truth latent velocity\n\np(yij|zi; xi) = N (yij; aT\n\nijzi, \u03c32).\n\n(1)\n\nThe observed radial velocity measurements are conditionally independent given the latent velocities,\nso the joint likelihood factorizes completely\n\np(y|z; x) =\n\np(yij|zi; xi) =\n\nN (yij; aT\n\nijzi, \u03c32).\n\n(2)\n\n(cid:89)\n\n(cid:89)\n\nj\u2208Oi\n\ni\n\n(cid:89)\n\n(cid:89)\n\nj\u2208Oi\n\ni\n\nGP prior. We model the latent velocity \ufb01eld as a vector-valued GP. The GP prior has a zero-valued\nmean function and a modi\ufb01ed squared exponential kernel. Since the GP is vector-valued, the output\nof the kernel function is a 3 \u00d7 3 matrix of the following form.\n\n(cid:18)\n\n(cid:18)\u2212d\u03b1(xi, xj)\n\n(cid:19)\n\n(cid:18)\u2212d\u03b1(xi, xj)\n\n(cid:19)\n\n(cid:18)\u2212d\u03b1(xi, xj)\n\n(cid:19)(cid:19)\n\n2\u03b2w\n\n\u03ba\u03b8(xi, xj) = diag\n, exp\nd\u03b1(xi, xj) = \u03b11(xi,1 \u2212 xj,1)2 + \u03b12(xi,2 \u2212 xj,2)2 + \u03b13(xi,3 \u2212 xj,3)2\n\n, exp\n\n2\u03b2u\n\n2\u03b2v\n\nexp\n\n(3)\n\n(4)\n\nThe hyperparameters \u03b8 = [\u03b1, \u03b2] are the length scales which control the uniformity of the latent\nvelocity \ufb01eld.\n\nCovariance between measurements. Our approach to inferring the latent velocities relies on\nthe ability to jointly model the radial velocity measurements with the latent velocities. In order\nto accomplish this, we need to have a covariance function relating radial velocity measurements.\nIntuitively this seems problematic, since the radial velocity measurements not only depend on the\nlocation of the measurement, but also the location of the station making the measurement. As it turns\nout, applying de\ufb01nitions and the process by which radial velocity measurements are made gives the\nfollowing elegant covariance function.\n\nCov(yij, yi(cid:48)j(cid:48)) = E[yijyi(cid:48)j(cid:48)] = aT\n\nij\n\nE[zizT\n\ni(cid:48) ]ai(cid:48)j(cid:48) = aT\n\nij\u03ba\u03b8(xi, xi(cid:48))ai(cid:48)j(cid:48)\n\nObserve that this covariance function is not stationary, since it relies on the locations of the stations\nfrom which the measurements were made.\n\nJoint modeling measurements and latent velocities. The joint probability distribution between\nthe radial velocity measurements and the latent velocities is\n\np(y, z; x) = p(y|z; x)p(z; x).\n\n(5)\n\nSince both the likelihood and prior are Gaussian, the joint is also Gaussian. All we need to do to\nfully specify the joint distribution is to solve for the \ufb01rst two moments of the joint. The joint mean is\n\nclearly zero. Let qT = [zT yT ], let A = diag(cid:0)(cid:8)aT\n\n(cid:9)(cid:1) \u2208 R3n\u00d7n be the matrix de\ufb01ned\n\nij |\u2200i, j \u2208 Oi\n\n4\n\n\fInitialize \u03bd(0) randomly\nInitialize \u2206\u03bd = \u221e\nwhile |\u2206\u03bd| > \u03c4 do\n\nAlgorithm 1 Ef\ufb01cient Inference using Laplace\u2019s Method\n1: procedure INFERLATENTVELOCITIES\n2:\n3:\n4:\n5:\n6:\n7:\n8:\n9:\n\nCompute b = W zk + \u2207l(zk)\nCompute \u03b3 = (W \u22121 + K)\u22121Kb using the conjugate gradient method\nLet \u2206\u03bd = b \u2212 \u03b3 \u2212 \u03bd(k)\nSet \u03bd(k+1) = \u03bd(k) + \u03b7\u2206\u03bd\n\nreturn z\u2217 = K\u03bd\u2217\n\n(cid:46) \u03bd(0) = K\u22121z(0)\n\n(cid:46) \u03c4 is some user-de\ufb01ned threshold\n\n(cid:46) Use Brent\u2019s method to do a line search for \u03b7\n\nso that y \u223c N (Az, \u03c32I), and let K be the prior covariance matrix. The covariance of the joint is as\nfollows\n\n(cid:35)\n\nKAT\n\nAK T AKAT + \u03c32I\n\nE[qqT ] =\n\nHence, the joint distribution is\n\n(cid:20)(cid:20)z\n\ny\n\n(cid:34) K\n(cid:21)(cid:2)zT yT(cid:3)(cid:21)\n(cid:34) K\n(cid:32)(cid:20)z\n(cid:21)\n\n=\n\n; 0,\n\ny\n\np(y, z; x) = N\n\nKAT\n\nAK T AKAT + \u03c32I\n\n(6)\n\n(7)\n\n.\n\n(cid:35)(cid:33)\n\nNaive Exact Inference. Given this joint distribution, we can perform exact inference via Gaussian\nconditioning. The posterior mean is\n\nE[z|y; x] = KAT (AKAT + \u03c32I)\u22121y.\n\n(8)\n\nWe can also predict directly at locations \u02dcz other than those where measurements were made using the\ncross-covariance matrix \u02dcK between the locations where measurements were made and prediction\nlocations:\n\n(9)\nThis method of inference is not scalable since it has cubic time complexity and quadratic space\ncomplexity in the number of measurements.\n\nE[\u02dcz|y; x] = \u02dcKAT (AKAT + \u03c32I)\u22121y.\n\n4 Ef\ufb01cient Inference\n\nIn this section, we discuss how we can perform ef\ufb01cient exact inference despite the lack of a stationary\nkernel.\n\n4.1 Laplace\u2019s Method for Exact Inference\n\nIn order to make inference tractable, we would like to use fast FFT-based methods such as SKI\nand KISS-GP [18], but unfortunately these methods require the kernel to be stationary. To over-\ncome having a non-stationary kernel, we apply Laplace\u2019s method [26]. This is conventionally for\napproximate inference when the likelihood is not Gaussian, but we use it to be able to utilize fast\nkernel operations for the latent GP, which is stationary, and the method will still be exact. Laplace\u2019s\nmethod replaces one-shot matrix inversion based inference with an iterative algorithm where the most\ncomplicated operation is kernel-vector multiplication. If we pick locations to observe radial velocity\nmeasurements on a grid \u2126, we can perform the matrix-vector multiplication Ks, for an arbitrary\nvector s, in O(n log n) time, where n = |\u2126|.\nThe exact inference procedure we employ is presented in Algorithm 1. Laplace\u2019s method iteratively\noptimizes log p(z|y; x) by optimizing the second-order Taylor expansion around the current iterate\nof z via an auxiliary variable \u03bd = K\u22121z. Let l(z) = log p(z|y; x) be the log likelihood function,\n\u2207l(z) be the gradient of the log likelihood, and W = \u2212\u22072l(z) be the negative Hessian. The most\nchallenging operation to make ef\ufb01cient is Line 6 of Algorithm 1. We use the conjugate gradient\nmethod to iteratively compute \u03b3. The upshot is that we only need to be able to ef\ufb01ciently compute\n\n5\n\n\fW \u22121, multiply W \u22121 times arbitrary vectors, and multiply K times arbitrary vectors. W is block\ndiagonal with 3 \u00d7 3 blocks, which makes for linear time matrix-vector multiplication and inversion.\nThe only other bottleneck for both speed and storage is the kernel matrix.\n\n4.2 Using Grid Structure for Fast Matrix-Vector Multiplication\n\nIn this section, we detail how we can perform ef\ufb01cient kernel-vector multiplication by exploiting the\nspecial structure of the kernel matrix following techniques presented by Wilson [27]. To accomplish\nthis we need to choose the measurements to use as observations from an evenly spaced grid. In most\ncases, we will not have measurements for all grid points, so we use pseudo-observations to enable the\nuse of grid-based methods.\n\n4.2.1 Missing Observations\n\nGiven \u2126 to be the set of grid locations where we would like to have radial velocity measurements, let\n\u02c6\u2126 and \u02dc\u2126 be the locations where we have and do not have radial velocity measurements, respectively.\nFor all grid locations xi \u2208 \u02dc\u2126, we sample a pseudo radial velocity measurement yi \u223c N (0, \u0001\u22121), for\nsome small \u0001. This implies the following joint log likelihood:\n\n(cid:88)\n\n\uf8eb\uf8ed1[xi \u2208 \u02dc\u2126] log N (yi; 0, \u0001\u22121) + 1[xi \u2208 \u02c6\u2126]\n\n\uf8eb\uf8ed(cid:88)\n\nlog N (yij; aT\n\nijzi, \u03c32)\n\n\uf8f6\uf8f8\uf8f6\uf8f8 .\n\nl(z) =\n\n(10)\n\nj\u2208Oi\n\ni\n\n4.2.2 Kronecker-Toeplitz Structure\n\nThe latent GP can be decomposed into three independent GP\u2019s \u2013 namely, over the u, v, and w\ncomponents of the latent velocities, respectively. Let Ku, Kv, and Kw be kernel matrices for each\nof these GP\u2019s, respectively, and all have shape n \u00d7 n. When performing the multiplication Ks, we\ndecompose s into it\u2019s u, v, and w component sub-vectors denoted su, sv, and sw, respectively. Then,\nwe perform each of the multiplications Kusu, Kvsv, and Kwsw, and recombine the results to get\nKs. All of these three multiplications are similar since Ku, Kv, and Kw all have the same structure.\nWe use Ku as an example and follow the method proposed by Wilson [27]. Kv and Kw follow the\nsame form. Ku decomposes into the Kronecker product Ku,1 \u2297 Ku,2 \u2297 Ku,3, where Ku,1, Ku,2,\nand Ku,3 are all Toeplitz, since Ku is stationary. Ku,1 has shape n1 \u00d7 n1, Ku,2 has shape n2 \u00d7 n2,\nand Ku,3 has shape n3 \u00d7 n3 where n1, n2, and n3 are the dimensions of the grid, respectively.\nHence, n = n1n2n3. Let Su be the n1 \u00d7 n2 \u00d7 n3 tensor formed by reshaping su to match the grid\ndimensions. Then\n\n(cid:32) 3(cid:79)\n\n(cid:33)\n\n(cid:32)\n\nKusu =\n\nKu,i\n\nsu = vec\n\nSu \u00d71 Ku,1 \u00d72 Ku,2 \u00d73 Ku,3\n\n.\n\n(cid:33)\n\ni=1\n\nreshaping T into a matrix T(i) of size ni \u00d7(cid:81)\n\nHere, the operation T \u00d7i Mi denotes the i-mode product of the tensor T \u2208 Rn1\u00d7n2\u00d7n3 and matrix\nMi \u2208 Rni\u00d7ni. The result is another tensor T(cid:48) with the same dimensions. It is computed by \ufb01rst\nj(cid:54)=i nj, then computing the matrix product MiT(i),\nand \ufb01nally reshaping the result back into an n1 \u00d7 n2 \u00d7 n3 tensor \u2014 see [28] for details. In our case,\nsince each matrix multiplication is between a Toeplitz matrix Ku,i and a matrix T(i) with n entries,\nit can be done in O(n log n) time using the FFT [29]. Therefore, the overall running time is also\nO(n log n).\n\n4.3 Handling Aliased Data\n\nIn this section, we extend our model to handle aliased radial velocity measurements. Recall that\naliasing means that radial velocities are only known up to an additive multiple of twice the Nyquist\nvelocity Vmax, which varies by operating mode of the radar. Conditions favorable for bird migration\noften correspond to low values of Vmax and exacerbate aliasing problems.\nTo accommodate aliasing, we change the likelihood to model the aliasing process using a wrapped\nnormal likelihood [20]:\n\n\u221e(cid:88)\n\nk=\u2212\u221e\n\np(yij|zi; xi) = Nw(yij|aT\n\nijzi, \u03c32) =\n\n6\n\nN (yij + 2kVmax,j; aT\n\nijzi, \u03c32)\n\n(11)\n\n\fThis is simply the marginal density of all aliases of yij. The in\ufb01nite sum cannot be computed\nanalytically, so we approximate it with a \ufb01nite number of aliases, (cid:96) , which is known to perform\nwell [9, 30, 31].\n\n(cid:96)(cid:88)\n\nk=\u2212(cid:96)\n\np(yij|zi; xi) \u2248 N (cid:96)\n\nw(yij|aT\n\nijzi, \u03c32) =\n\nN (yij \u2212 aT\n\nijzi + 2kVmax,j; 0, \u03c32)\n\n(12)\n\nRecall that \u00afr aliases r to the interval [\u2212Vmax, Vmax], so the sum on the right-hand side is over the\n2(cid:96) + 1 aliases of yij that are closest to the predicted value aT\nijzi. Since our ef\ufb01cient inference method\nonly relies on the likelihood only through its gradient and Hessian, we can simply plug these new\nfunctions into the algorithm presented in Algorithm 1. Observe that this likelihood is no longer\nGaussian, and thus we are no longer performing exact inference using Laplace\u2019s method.\n\n5 Experiments\n\nIn this section, we present the results from experiments to evaluate the effectiveness of the method we\npresented in the previous section. The \ufb01rst two experiments analyze data scans from 13 radar stations\nfrom the northeast US on the night of September 11, 2010. In all experiments, hyperparameters are\n\ufb01xed at values chosen through preliminary experiments to match the expected smoothness of the\ndata, so that the RMSE between inferred radial velocities and raw measurements match values from\nvelocity models used in prior research [7, 9].\n\nComparison of inference methods. First, we compare our fast inference method against the naive\ninference method. In our experiments we \ufb01rst resample data from all radar stations onto a \ufb01xed\nresolution grid. Each grid point has zero or more observations from different radar stations. The naive\nmethod operates only on the actual observations m, and its running time is O(m3). Our grid-based\nmethod operates on all n grid points, and its per-iteration running time is O(n log n). To tractably\nperform naive inference we must subsample the m observations even further. We consider a range of\ndifferent sizes both for the base grid and the subsampled data set for the naive method.\nFigure 2 shows the time vs. error for six different methods. The data set consists of radar scans\nfrom 13 radar stations from the northeast US on the night of September 11, 2010, and, for this\ntest, is preprocessed to eliminate aliasing errors [9]. Error is measured by \ufb01rst inferring the full\nvelocity vector for each observation and then projecting it using the station-speci\ufb01c geometry to\ncompute the RMSE between the predicted and observed radial velocities. To fairly compare RMSE\nvalues across the six methods, the naive method must predict values for all observations, not just its\nsubsample. To do this, we use the method presented in Equation 9. Each method was run on six\ndifferent three-dimensional grids with total sizes ranging from 51,200 to 219,700 grid points. We\ncompare our fast inference method against \ufb01ve different subsample sizes for the naive method. Every\nexperiment was run 10 times and the average time and RMSE is reported in Figure 2.\nThe grid-based Laplace\u2019s method vastly outperforms the naive method. Not only does the naive\nmethod get slower with an increase in grid size, but it also starts to perform worse, since it has to\nmake predictions at a \ufb01ner resolution from the same number of subsampled observations. Note that\nthe naive method is also making predictions at roughly an order of magnitude fewer locations than\nthe fast method because there are many grid points with zero observations.\n\nComparison of likelihood functions. Next, we show in Figure 4 the importance of the wrapped\nnormal likelihood when dealing with aliased data. We use the raw radial velocity data from 13 radar\nstations in the northeast US from the night of September 11, 2010. Figure 4(a) shows the inferred\nvelocity \ufb01eld using our method with the Gaussian likelihood and Figure 4(b) shows the inferred\nvelocity \ufb01eld using our method with the wrapped normal likelihood. Observe the region of the\nvelocity \ufb01eld highlighted by the rectangle. The inference method with Gaussian likelihood fails to\ninfer a reasonable velocity \ufb01eld in the presence of heavily aliased radial velocity measurements and\nhas a substantially higher RMSE1 than the method with the wrapped normal likelihood. The latter\nmodel correctly infers from raw aliased radial velocities that the birds over those stations are \ufb02ying in\nthe same general direction as birds over nearby stations.\n\n1For aliased data, RMSE is measured between the observed value and the closest alias of the predicted value.\n\n7\n\n\fFigure 2: Time vs. RMSE of radial\nvelocity measurements using six dif-\nferent methods for latent velocity in-\nference.\n\nFigure 3: Density and velocity of bird migration on night of\nMay 2, 2015. Northward migration occurs across the US,\nand is intense in the central US.\n\n(a) Gaussian Likelihood, RMSE=6.21\n\n(b) Wrapped Normal Likelihood, RMSE=4.61\n\nFigure 4: Inference method performance using two likelihood functions on aliased data. Grid size is\n100 \u00d7 100 \u00d7 9; only the lowest elevation (500m above ground level) is displayed.\n\nScaling to the continental US. A unique aspect of our method is that it can, for the \ufb01rst time,\nassimilate data from all radar stations to reconstruct spatially detailed velocity \ufb01elds across the whole\nUS. An example is shown in Figure 1(d), which depicts northward bird migration on the night of\nMay 2, 2015. The grid size is 240 \u00d7 120 \u00d7 10; only the lowest elevation and every 5th velocity\nmeasurement is plotted. The reconstructed velocities can be combined with re\ufb02ectivity data as\nshown in Figure 3 to observe both the density and velocity of migration. Future work can conduct\nquantitative analyses of migration biology using these measurements.\n\n6 Conclusion and Future Work\n\nWe presented the \ufb01rst comprehensive solution to the problem of inferring latent velocities from\nradial velocity measurements from weather radar stations across the US. Our end-to-end method\nprobabilistic model begins with raw radial velocity from many radar stations, and outputs valuable\ninformation about migration patterns of birds at scale. We presented a novel method to perform fast\ngrid-based posterior inference even though our GP does not have a stationary kernel. The results\nof our methods can be used by ecologists to expand human knowledge about bird movements to\nadvance conservation efforts and science.\nOur current method is most suited to smooth velocity \ufb01elds, such as those that occur during bird\nmigration. A promising line of future work is to extend our techniques to infer wind velocity \ufb01elds by\nmeasuring velocity of precipitation and wind-borne particles. We anticipate that our GP methodology\n\n8\n\n\fcan also apply to this domain, but we will need to experiment with different kernels better suited to\nthese velocity \ufb01elds, which can be much more complex.\n\nAcknowledgments\n\nThis material is based upon work supported by the National Science Foundation under Grant Nos.\n1522054 and 1661259.\n\nReferences\n[1] Timothy D. Crum and Ron L. Alberty. The WSR-88D and the WSR-88D operational support\n\nfacility. Bulletin of the American Meteorological Society, 74(9):1669\u20131687, 1993.\n\n[2] Thomas H. Kunz, Sidney A. Gauthreaux, Jr, Nickolay I. Hristov, Jason W. Horn, Gareth Jones,\nElisabeth K. V. Kalko, Ronald P. Larkin, Gary F. McCracken, Sharon M. Swartz, Robert B.\nSrygley, Robert Dudley, John K. Westbrook, and Martin Wikelski. Aeroecology: probing and\nmodeling the aerosphere. Integrative and Comparative Biology, 48(1):1\u201311, 2008.\n\n[3] J.T. Johnson, Pamela L. MacKeen, Arthur Witt, E. De Wayne Mitchell, Gregory J. Stumpf,\nMichael D. Eilts, and Kevin W. Thomas. The storm cell identi\ufb01cation and tracking algorithm:\nAn enhanced WSR-88D algorithm. Weather and forecasting, 13(2):263\u2013276, 1998.\n\n[4] Richard A. Fulton, Jay P. Breidenbach, Dong-Jun Seo, Dennis A. Miller, and Timothy O\u2019Bannon.\n\nThe WSR-88D rainfall algorithm. Weather and Forecasting, 13(2):377\u2013395, 1998.\n\n[5] Andrew Farnsworth, Benjamin M. Van Doren, Wesley M. Hochachka, Daniel Sheldon, Kevin\nWinner, Jed Irvine, Jeffrey Geevarghese, and Steve Kelling. A characterization of autumn\nnocturnal migration detected by weather surveillance radars in the northeastern USA. Ecological\nApplications, 26(3):752\u2013770, 2016. ISSN 1939-5582.\n\n[6] Jeffrey J. Buler and Robert H. Diehl. Quantifying bird density during migratory stopover using\nweather surveillance radar. IEEE Transactions on Geoscience and Remote Sensing, 47(8):\n2741\u20132751, 2009.\n\n[7] Adriaan M. Dokter, Felix Liechti, Herbert Stark, Laurent Delobbe, Pierre Tabary, and Iwan\nHolleman. Bird migration \ufb02ight altitudes studied by a network of operational weather radars.\nJournal of the Royal Society Interface, page rsif20100116, 2010.\n\n[8] Judy Shamoun-Baranes, Andrew Farnsworth, Bart Aelterman, Jose A. Alves, Kevin Azijn,\nGarrett Bernstein, S\u00e9rgio Branco, Peter Desmet, Adriaan M. Dokter, Kyle Horton, Steve\nKelling, Jeffrey F. Kelly, Hidde Leijnse, Jingjing Rong, Daniel Sheldon, Wouter Van den\nBroeck, Jan Klaas Van Den Meersche, Benjamin Mark Van Doren, and Hans van Gasteren.\nInnovative Visualizations Shed Light on Avian Nocturnal Migration. PLoS ONE, 11(8):1\u201315,\n2016.\n\n[9] Daniel R. Sheldon, Andrew Farnsworth, Jed Irvine, Benjamin Van Doren, Kevin F. Webb,\nThomas G. Dietterich, and Steve Kelling. Approximate Bayesian Inference for Reconstructing\nVelocities of Migrating Birds from Weather Radar. In AAAI, 2013.\n\n[10] Aruni RoyChowdhury, Daniel Sheldon, Subhransu Maji, and Erik Learned-Miller. Distinguish-\ning Weather Phenomena from Bird Migration Patterns in Radar Imagery. In CVPR workshop\non Perception Beyond the Visual Spectrum (PBVS), pages 1\u20138, 2016.\n\n[11] Horton Kyle G., Van Doren Benjamin M., La Sorte Frank A., Fink Daniel, Sheldon Daniel,\nFarnsworth Andrew, and Kelly Jeffrey F. Navigating north: how body mass and winds shape\navian \ufb02ight behaviours across a North American migratory \ufb02yway. Ecology Letters, 0(0).\n\n[12] Frank La Sorte, Wesley Hochachka, Andrew Farnsworth, Daniel Sheldon, Daniel Fink, Jeffrey\nGeevarghese, Kevin Winner, Benjamin Van Doren, and Steve Kelling. Migration timing and\nits determinants for nocturnal migratory birds during autumn migration. Journal of Animal\nEcology, 84(5):1202\u20131212, 2015.\n\n9\n\n\f[13] Frank A. La Sorte, Wesley M. Hochachka, Andrew Farnsworth, Daniel Sheldon, Benjamin M.\nVan Doren, Daniel Fink, and Steve Kelling. Seasonal changes in the altitudinal distribution of\nnocturnally migrating birds during autumn migration. 2(12):1\u201315, 2015.\n\n[14] Kyle G. Horton, Benjamin M. Van Doren, Phillip M. Stepanian, Wesley M. Hochachka, Andrew\nFarnsworth, and Jeffrey F. Kelly. Nocturnally migrating songbirds drift when they can and\ncompensate when they must. Scienti\ufb01c Reports, 6:21249, 2016.\n\n[15] Benjamin M. Van Doren, Kyle G. Horton, Adriaan M. Dokter, Holger Klinck, Susan B. Elbin,\nand Andrew Farnsworth. High-intensity urban light installation dramatically alters nocturnal\nbird migration. Proceedings of the National Academy of Sciences, 114(42):11175\u201311180, 2017.\n\n[16] Richard J. Doviak. Doppler radar and weather observations. Courier Corporation, 1993.\n\n[17] Michael L. Stein, Jie Chen, and Mihai Anitescu. Stochastic Approximation of Score Functions\n\nfor Gaussian Processes. The Annals of Applied Statistics, 7(2):1162\u20131191, 2013.\n\n[18] Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured Gaussian\nprocesses (KISS-GP). In International Conference on Machine Learning, pages 1775\u20131784,\n2015.\n\n[19] Jonathan R. Stroud, Michael L. Stein, and Shaun Lysen. Bayesian and Maximum Likelihood\nEstimation for Gaussian Processes on an Incomplete Lattice. Journal of Computational and\nGraphical Statistics, 26(1):108\u2013120, 2017.\n\n[20] Ernst Breitenberger. Analogues of the Normal Distribution on the Circle and the Sphere.\n\nBiometrika, 50(1/2):81\u201388, 1963.\n\n[21] Peter S. Ray and Karen L. Sangren. Multiple-Doppler Radar Network Design. Journal of\n\nclimate and applied meteorology, 22(8):1444\u20131454, 1983.\n\n[22] Edin Insanic and Paul R. Siqueira. A Maximum Likelihood Approach to Estimation of Vector\nVelocity in Doppler Radar Networks. IEEE Transactions on Geoscience and Remote Sensing,\n50(2):553\u2013567, 2012.\n\n[23] William R. Bergen and Steven C. Albers. Two-and Three-dimensional De-aliasing of Doppler\n\nRadar Velocities. Journal of Atmospheric and Oceanic technology, 5(2):305\u2013319, 1988.\n\n[24] Pierre Tabary, Georges Scialom, and Urs Germann. Real-Time Retrieval of the Wind from\nAliased Velocities Measured by Doppler Radars. Journal of Atmospheric and Oceanic technol-\nogy, 18(6):875\u2013882, 2001.\n\n[25] Jidong Gao and Kelvin K. Droegemeier. A Variational Technique for Dealiasing Doppler Radial\n\nVelocity Data. Journal of Applied Meteorology, 43(6):934\u2013940, 2004.\n\n[26] Carl Edward Rasmussen and Christopher K.I. Williams. Gaussian Processes for Machine\nISBN\n\nLearning (Adaptive Computation and Machine Learning). The MIT Press, 2005.\n026218253X.\n\n[27] Andrew Gordon Wilson. Covariance kernels for fast automatic pattern discovery and extrapo-\n\nlation with Gaussian processes. PhD thesis, University of Cambridge, 2014.\n\n[28] Tamara G. Kolda and Brett W. Bader. Tensor Decompositions and Applications. SIAM review,\n\n51(3):455\u2013500, 2009.\n\n[29] Martin Ohsmann. Fast transforms of Toeplitz matrices. Linear algebra and its applications,\n\n231:181\u2013192, 1995.\n\n[30] Yannis Agiomyrgiannakis and Yannis Stylianou. Wrapped Gaussian mixture models for\nmodeling and high-rate quantization of phase data of speech. IEEE Transactions on Audio,\nSpeech, and Language Processing, 17(4):775\u2013786, 2009.\n\n[31] Claus Bahlmann. Directional features in online handwriting recognition. Pattern Recognition,\n\n39(1):115\u2013125, 2006.\n\n10\n\n\f", "award": [], "sourceid": 5377, "authors": [{"given_name": "Rico", "family_name": "Angell", "institution": "University of Massachusetts"}, {"given_name": "Daniel", "family_name": "Sheldon", "institution": "University of Massachusetts Amherst"}]}