{"title": "Band-Limited Gaussian Processes: The Sinc Kernel", "book": "Advances in Neural Information Processing Systems", "page_first": 12749, "page_last": 12759, "abstract": "We propose a novel class of Gaussian processes (GPs) whose spectra have compact support, meaning that their sample trajectories are almost-surely band limited. As a complement to the growing literature on spectral design of covariance kernels, the core of our proposal is to model power spectral densities through a rectangular function, which results in a kernel based on the sinc function with straightforward extensions to non-centred (around zero frequency) and frequency-varying cases. In addition to its use in regression, the relationship between the sinc kernel and the classic theory is illuminated, in particular, the Shannon-Nyquist theorem is interpreted as posterior reconstruction under the proposed kernel. Additionally, we show that the sinc kernel is instrumental in two fundamental signal processing applications: first, in stereo amplitude modulation, where the non-centred sinc kernel arises naturally. Second, for band-pass filtering, where the proposed kernel allows for a Bayesian treatment that is robust to observation noise and missing data. The developed theory is complemented with illustrative graphic examples and validated experimentally using real-world data.", "full_text": "Band-Limited Gaussian Processes:\n\nThe Sinc Kernel\n\nFelipe Tobar\n\nCenter for Mathematical Modeling\n\nUniversidad de Chile\n\nftobar@dim.uchile.cl\n\nAbstract\n\nWe propose a novel class of Gaussian processes (GPs) whose spectra have compact\nsupport, meaning that their sample trajectories are almost-surely band limited. As\na complement to the growing literature on spectral design of covariance kernels,\nthe core of our proposal is to model power spectral densities through a rectangular\nfunction, which results in a kernel based on the sinc function with straightforward\nextensions to non-centred (around zero frequency) and frequency-varying cases.\nIn addition to its use in regression, the relationship between the sinc kernel and\nthe classic theory is illuminated, in particular, the Shannon-Nyquist theorem is\ninterpreted as posterior reconstruction under the proposed kernel. Additionally,\nwe show that the sinc kernel is instrumental in two fundamental signal processing\napplications: \ufb01rst, in stereo amplitude modulation, where the non-centred sinc\nkernel arises naturally. Second, for band-pass \ufb01ltering, where the proposed kernel\nallows for a Bayesian treatment that is robust to observation noise and missing\ndata. The developed theory is complemented with illustrative graphic examples\nand validated experimentally using real-world data.\n\n1\n\nIntroduction\n\n1.1 Spectral representation and Gaussian processes\n\nThe spectral representation of time series is both meaningful and practical in a plethora of scien-\nti\ufb01c domains. From seismology to medical imagining, and from astronomy to audio processing,\nunderstanding which fraction of the energy in a time series is contained on a speci\ufb01c frequency band\nis key for, e.g., detecting critical events, reconstruction, and denoising. The literature on spectral\nestimation [13, 24] enjoys of a long-standing reputation with proven success in real-world applica-\ntions in discrete-time signal processing and related \ufb01elds. For unevenly-sampled noise-corrupted\nobservations, Bayesian approaches to spectral representation emerged in the late 1980s and early\n1990s [4, 11, 8], thus reformulating spectral analysis as an inference problem which bene\ufb01ts from the\nmachinery of Bayesian probability theory [12].\nIn parallel to the advances of spectral analysis, the interface between probability, statistics and\nmachine learning (ML) witnessed the development of Gaussian processes (GP, [21]), a nonparametric\ngenerative model for time series with unparalleled modelling abilities and unique conjugacy properties\nfor Bayesian inference. GPs are the de facto model in the ML community to learn (continuous-\ntime) time series in the presence of unevenly-sampled observations corrupted by noise. Recent GP\nmodels rely on Bochner theorem [2], which indicates that the covariance kernel and power spectral\ndensity (PSD) of a stationary stochastic process are Fourier pairs, to construct kernels by direct\nparametrisation of PSDs to then express the kernel via the inverse Fourier transform. The precursor\nof this concept in ML is the spectral-mixture kernel (SM, [32]), which models PSDs as Gaussian\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fRBFs, and its multivariate extensions [29, 19]. Accordingly, spectral-based sparse GP approximations\n[15, 10, 5, 14] also provide improved computational ef\ufb01ciency.\n\n1.2 Contribution and organisation\n\nA fundamental object across the signal processing toolkit is the normalised sinc function, de\ufb01ned by\n\nsinc(x) =\n\nsin \u03c0x\n\n\u03c0x\n\n.\n\n(1)\n\nIts importance stems from its role as the optimal basis for reconstruction (in the Shannon-Whittaker\nsense [31]) and the fact that its Fourier transform is the rectangle function, which has compact support.\nOur hypothesis is that the symbiosis between spectral estimation and GPs can greatly bene\ufb01t from\nthe properties of kernels inspired the sinc function, yet this has not been studied in the context of\nGPs. In a nutshell we propose to parametrise the PSD by a (non-centred) rectangular function, thus\nyielding kernels de\ufb01ned by a sinc function times a cosine, resembling the SM kernel [32] with the\nmain distinction that the proposed PSD has compact, rather than in\ufb01nite, support.\nThe next section introduces the proposed sinc kernel, its centred/non-centred/frequency-varying\nvariants as well as its connections to sum-of-in\ufb01nite-sinusoids models. Section 3 interprets posterior\nreconstruction using the sinc kernel from the Shannon-Nyquist perspective. Then, Sections 4 and\n5 revise the role of the sinc kernel in two signal processing applications: stereo demodulation and\nband-pass \ufb01ltering. Lastly, Section 6 validates the proposed kernel through numerical experiments\nwith real-world signals and Section 7 presents the future research steps and main conclusions.\n\n2 Compact spectral support via the sinc kernel\n\nThe Bochner theorem [2] establishes the connection between a (stationary) positive de\ufb01nite kernel K\nand a density S via the Fourier transform F {\u00b7}, that is,\n\nK(t) = F\u22121 {S(\u03be)} (t),\n\n(2)\nwhere the function S : Rn (cid:55)\u2192 R+ is Lebesgue integrable. This result allows us to design a valid\npositive de\ufb01nite function K by simply choosing a positive function S (a much easier task), to then\nanti-Fourier transform it according to eq. (2). This is of particular importance in GPs, where we\ncan identify K as the covariance kernel and S the spectral density, therefore, the design of the GP\ncan be performed in the spectral domain rather than the temporal/spatial one. Though temporal\nconstruction is the classical alternative, spectral-based approaches to covariance design have become\npopular for both scalar and vector-valued processes [32, 29, 19], and even for nonstationary [22] and\nnonparametric [27, 26] cases.\nWe next focus on GPs that are bandlimited, or in other words, that have a spectral density with\ncompact support based on the sinc kernel.\n\n2.1 Construction from the inverse Fourier transform of rectangular spectrum\n\nLet us denote the rectangular function given by\n\n1/2\n0\n\n\uf8f1\uf8f2\uf8f31\n(cid:18)\n\nrect\n\n\u2206\n\nrect (\u03be) def=\n\n|\u03be| < 1/2\n|\u03be| = 1/2\nelsewhere,\n\n(3)\n\nand consider a GP with a power spectral density (PSD), denoted by S, given by the sum of two\nrectangular functions placed symmetrically1 wrt the origin at \u03be0 and \u2212\u03be0, with widths equal to \u2206\nand total power equal to \u03c32. We refer to this construction as the symmetric rectangle function with\ncentre \u03be0, width \u2206 and power \u03c32 denoted by\n\u03c32\n2\u2206\n\n(cid:18) \u03be \u2212 \u03be0\n\nsimrect\u03be0,\u2206,\u03c32 (\u03be) def=\n\n(cid:18) \u03be + \u03be0\n\n(cid:19)(cid:19)\n\n+ rect\n\n(cid:19)\n\n,\n\n\u2206\n\n(4)\n\n1We consider PSDs that are symmetric wrt the origin since we focus on the real-valued GPs. Nevertheless,\nthe presented theory can be readily extended to non-symmetric PSDs that would give rise to complex-valued\ncovariances and thus complex-valued GP trajectories [3, 28]\n\n2\n\n\fwhere the denominator 2\u2206 ensures that the function integrates \u03c32 and the explicit dependence on\n\u03be0, \u2206, \u03c32 will only be shown when required. We assume \u2206 > 0; \u03be0, \u03c32 \u2265 0, and note that the\nrectangles are allowed to overlap if \u2206 > 2\u03be0.\nWe can then calculate the kernel associated with the PSD given by S(\u03be) = simrect\u03be0,\u2206,\u03c32 (\u03be) using\nthe standard properties of the Fourier transform. In particular, we can do so by identifying the\nsymmetric rectangle function in eq. (4) as a convolution between a (centred) rectangle and two Dirac\ndelta functions on {\u03be0,\u2212\u03be0}. We de\ufb01ne this kernel as follows.\nDe\ufb01nition 1 (The Sinc Kernel). The stationary covariance kernel resulting from the inverse Fourier\ntransform of the symmetric rectangle function in eq. (4) given by\n\n(5)\nis referred to as the sinc kernel of frequency \u03be0 \u2265 0, bandwidth \u2206 \u2265 0 and magnitude \u03c32 \u2265 0. The\nexpression sinc(t) = sin \u03c0t\nis known as the the normalised sinc function, and when \u03be0 = 0 we refer\n\u03c0t\nto the above expression as the centred sinc kernel.\n\nSK (t) def= \u03c32 sinc(\u2206t) cos(2\u03c0\u03be0t),\n\nBeing positive de\ufb01nite by construction, the sinc kernel can be used within a GP for training, inference\nand prediction. Thus, we implemented a GP with the sinc kernel (henceforth GP-sinc) for the\ninterpolation/extrapolation of a heart-rate time series from the MIT-BIH database [7]. Using one third\nof the data, training the GP-sinc (plus noise variance) was achieved by maximum likelihood, were\nboth the BFGS [33] and Powell [20] optimisers yielded similar results. Fig. 1 shows the leant PSD\nand kernel alongside the periodogram for comparison, and a sample path for temporal reconstruction\nand forecasting. We highlight that the sinc function implemented in Python used in this optimisation\nwas numerically stable for both optimisers and multiple initial conditioned considered.\n\nFigure 1: Implementation of the sinc kernel on a heart-rate time series. Notice that (i) the learnt\nkernel shares the same support as the periodogram, (ii) the error bars in the reconstruction are tight,\nand (iii) the harmonic content in the forecasting part is consistent with the ground truth.\n\n2.2 Construction from a mixture of in\ufb01nite sinusoids\n\nConstructing kernels for GP models as a sum of in\ufb01nite components is known to aid the interpretation\nof its hyperparameters [21]. For the sinc kernel, let us consider an in\ufb01nite sum of sines and cosines\nwith random magnitudes respectively given by \u03b1(\u03be), \u03b2(\u03be) \u223c N (0, \u03c32) i.i.d., and frequencies between\n\u03be0 \u2212 \u2206\n\n2 and \u03be0 + \u2206\n\n2 . That is,\n\n(cid:90) \u03be0+\n\n\u2206\n2\n\u03be0\u2212 \u2206\n2\n\nf (t) =\n\n\u03b1(\u03be) sin(2\u03c0\u03bet) + \u03b2(\u03be) cos(2\u03c0\u03bet)d\u03be.\n\n(6)\n\nThe kernel corresponding to this zero-mean GP can be calculated using basic properties of the Fourier\ntransform, trigonometric identities and the independence of the components magnitudes. This kernel\nis stationary and given by the sinc kernel de\ufb01ned in eq. (5):\n\nK(t, t(cid:48)) = E [f (t)f (t(cid:48))] = \u03c32 sinc((t \u2212 t(cid:48))\u2206) cos(2\u03c0\u03be0(t \u2212 t(cid:48))) = SK (t \u2212 t(cid:48)) .\n\n(7)\n\nThe interpretation of this construction is that the paths of a GP-sinc can be understood as having\nfrequency components that are equally present in the range between \u03be0 \u2212 \u2206\n2 . On the\ncontrary, frequency components outside this range have zero probability to appear in the GP-sinc\nsample paths. In this sense, we say that the sample trajectories of a GP with sinc kernel are almost\n\nsurely band-limited, where the band is given by(cid:2)\u03be0 \u2212 \u2206\n\n2 and \u03be0 + \u2206\n\n(cid:3).\n\n2 , \u03be0 + \u2206\n\n2\n\n3\n\n0.000.010.020.030.040.05frequency [Hz]0200040006000power spectral densitypgramlearnt psd1000100time [seconds]0102030learnt covariance kernel02004006008001000time [seconds]10010observations, ground truth and GP-sinc reconstructionground truthobservationsGP-sinc\fGSK (t) =\n\nN ,\u03c32 (\u03be) \u0393(\u03be)\n\n0 , \u2206\n\n(cid:111)\n\n(cid:111)\n\nN(cid:88)\n\u2248 N(cid:88)\n\ni=1\n\nF\u22121(cid:110)\n(cid:16)\n\n\u0393\n\ni=1\n\nsimrect\u03be(i)\n\n(cid:17)F\u22121(cid:110)\n(cid:17)\n(cid:16)\nN(cid:88)\n\n\u03be(i)\n0\n\n\u0393\n\n\u03be(i)\n0\n\ni=1\n\n2.3 Frequency-varying spectrum\n\nThe proposed sinc kernel only caters for PSDs that are constant in their (compact) support due to\nthe rectangular model. We extend this construction to band-limited processes with a PSD that is a\nnon-constant function of the frequency. This is equivalent to modelling the PSD as\n\nS(\u03be) = simrect\u03be0,\u2206,\u03c32 (\u03be) \u0393(\u03be),\n\n(8)\nwhere the symmetric rectangle gives the support to the PSD and the function \u0393 controls the frequency\ndependency. Notice that the only relevant part of \u0393 is that in the support of simrect\u03be0,\u2206,\u03c32 (\u00b7),\nfurthermore, we assume that \u0393 is non-negative, symmetric and continuous almost everywhere (the\nneed for this will be clear shortly).\nFrom eq. (8), the proposed sinc kernel can be generalised for the frequency-varying case as\n\nGSK (t) def= F\u22121(cid:8)simrect\u03be0,\u2206,\u03c32 (\u03be) \u0393(\u03be)(cid:9) = SK (t) (cid:63) K\u0393(t),\n\n(9)\nreferred to as generalised sinc kernel, and where K\u0393(t) = F\u22121 {\u0393(\u03be)} is a positive de\ufb01nite function\ndue to (i) the Bochner theorem and (ii) the fact that \u0393(\u03be) is symmetric and nonnegative.\nThe convolution in the above equation can be computed analytically only in a few cases, most notably\nwhen K\u0393(t) is either a cosine or another sinc function, two rather limited scenarios. In the general\ncase, we can take advantage of the compact support of the symmetric rectangle in eq. (8), and express\nit as a sum of N \u2208 N narrower disjoint rectangles of width \u2206\nN to de\ufb01ne an N-th order approximation\nof GSK (t) through\n\nsimrect\u03be(i)\n\n0 , \u2206\n\n(cid:16)\n\nN ,\u03c32 (\u03be)\n\n(cid:17) def= GSKN (t) ,\n\n= sinc \u2206\nN t\n\ncos\n\n2\u03c0\u03be(i)\n0\n\n(10)\n\n(cid:16)\n0 = \u03be0 \u2212 \u2206 N +1\u22122i\n\u03be(i)\n0\n\n(cid:17)\n\nwhere \u03be(i)\n2N , and the approximation in eq. (10) follows the assumption that \u0393 (\u03be)\ncan be approximated by \u0393\n2N ] supported by the following remark.\nRemark 2. Observe that the expression in eq. (10) can be understood as a Riemann sum using the\nmid-point value. Therefore, convergence of GSKN (t) to GSK (t) as N goes to in\ufb01nite is guaranteed\nprovided that \u0393(\u00b7) is Riemman-integrable, or, equivalently, \u0393(\u00b7) is continuous almost everywhere.\nThis is a sound requirement as it is related to the existence of the inverse Fourier transform.\n\n0 \u2212 \u2206\n\nwithin [\u03be(i)\n\n2N , \u03be(i)\n\n0 + \u2206\n\n3 Relationship to Nyquist frequency and perfect reconstruction\n\nThe Nyquist\u2013Shannon sampling theorem speci\ufb01es a suf\ufb01cient condition for perfect, i.e., zero error,\nreconstruction of band-limited continuous-time signals using a \ufb01nite number of samples [23, 17].\nSince (i) GPs models are intrinsically related to reconstruction, and (ii) the proposed sinc kernel\nensures band-limited trajectories almost surely, we now study the reconstruction property the GP-sinc\nfrom a classical signal processing perspective.\nLet us focus on the baseband case (\u03be0 = 0), in which case we obtain the centred sinc kernel given by\n(11)\nFor a centred GP-sinc, f (t) \u223c GP(0, sinc\u03c32,0,\u2206), the Nyquist frequency is given by the width of\nits PSD, that is, \u2206. The following Proposition establishes the interpretation of Nyquist perfect\nreconstruction from the perspective of a vanishing posterior variance for a centred GP-sinc.\nProposition 3. The posterior distribution of a GP with centred sinc kernel concentrates on the\nWhittaker\u2013Shannon interpolation formula [23, 31] with zero variance when the observations are\nnoiseless and uniformly-spaced at the Nyquist frequency [17].\n\nSK (t) = \u03c32 sinc (\u2206t).\n\n4\n\n\fProof. Let us \ufb01rst consider n \u2208 N observations taken at the Nyquist frequency with times tn =\n[t1, . . . , tn] and values yn = [y1, . . . , yn]. With this notation, the posterior GP-sinc is given by\n\n\u039b\u22121 SK (t(cid:48), tn)),\n\n(cid:62)\np(f (t)|yn) = GP(SK (t, tn)\n\n(cid:62)\n\u039b\u22121yn, SK (t, t(cid:48)) \u2212 SK (t, tn)\n\n(12)\nwhere \u039b = SK (tn, tn) is the covariance of the observations and SK (t, t) denotes the vector of\ncovariances with the term SK (t, ti) = SK (t \u2212 ti) in the i\u2212th entry.\nA key step in the proof is to note that the covariance matrix \u039b is diagonal. This is because the\ndifference between any two observations times, ti, tj, is a multiple of the inverse Nyquist frequency\n\u2206\u22121, and the sinc kernel vanishes at all those multiples except for i = j; see eq. (11). Therefore,\nreplacing the inverse matrix \u039b\u22121 = \u03c3\u22122In and the centred sinc kernel in eq. (11) into eq. (12) allows\nus to write the posterior mean and variance (choosing t = t(cid:48) above) respectively as\nE [f (t)|yn] =\n\n(cid:32)\n1 \u2212 n(cid:88)\n\nyi sinc(\u2206(t \u2212 ti)),\n\nV [f (t)|yn] = \u03c32\n\nsinc2(\u2206(t \u2212 ti))\n\nn(cid:88)\n\n(cid:33)\n\n(13)\n\n.\n\ni=1\n\ni=1\n\nFor the \ufb01rst part of the proof, we can just apply limn\u2192\u221e to the posterior mean and readily identify\nthe Shannon-Whittaker interpolation formula: a convolution between the sinc function and the\nobservations.\nTo show that the posterior variance vanishes as n \u2192 \u221e, we proceed by showing that the Fourier\ntransform of the sum of square sinc functions in eq. (13) converges to a Dirac delta (at zero) of unit\nmagnitude instead, as these are equivalent statements. Denote by tri (\u00b7) the triangular function and\nobserve that\n\n(cid:40) \u221e(cid:88)\n\ni=1\n\nF\n\nsinc2(\u2206(t \u2212 ti))\n\n(cid:40) \u221e(cid:88)\n\n(cid:41)\n\n(cid:41)\n\n= F(cid:8)sinc2(\u2206t)(cid:9)F\n\u221e(cid:88)\n\n(cid:18) \u03be\n\n(cid:19)\n\n=\n\ntri\n\n\u2206\n\n\u2206\n\n1\n\u2206\n\ni=1\n\n\u03b4ti\n\nconv. def. & thm.\n\n(14)\n\ni=1\n\n\u03b4i\u2206\n\nFourier: sinc2(\u00b7) and \u03b4(\u00b7)\n\n= \u03b40(\u03be),\n\nwhere the last line follows from the fact that, out of all the Dirac deltas in the summation, the only one\nthat falls on the support of the triangular function (of width 2\u2206) is the one at the origin \u03b40(\u03be).\n\nThe above result opens perspectives for analysing GPs\u2019 reconstruction errors; this is needed in the\nGP literature. This is because a direct consequence of Proposition 3 is a quanti\ufb01cation of the required\nnumber of observations for zero posterior variance (or reconstruction error). This is instrumental to\ndesign sparse GPs where the number of inducing variables is chosen with a sound metric in mind:\nproximity to the Nyquist frequency. Finally, extending the above result to the non-baseband case can\nbe achieved through frequency modulation, the focus of the next section.\n\n4 Stereo amplitude modulation with GP-sinc\n\nWe can investigate the relationship between trajectories of GPs both for non-centred\u2014eq. (5)\u2014\nand centred\u2014eq. (11)\u2014sinc kernels using a latent factor model. Speci\ufb01cally, let us consider two\ni.i.d. GP-sinc processes x1, x2 \u223c GP(0, \u03c32 sinc(\u2206t)) with centred sinc kernel and construct the\nfactor model\n\n(15)\nObserve that, due to independence and linearity, the process x in eq. (15) is a GP with zero mean and\ncovariance given by a non-centred sinc kernel2\n\nx(t) = x1 cos(2\u03c0\u03be0t) + x2 sin(2\u03c0\u03be0t).\n\nKx(t, t(cid:48)) = E [x(t)x(t(cid:48))] = \u03c32 sinc(\u2206(t \u2212 t(cid:48))) cos(2\u03c0\u03be0(t \u2212 t(cid:48))) = SK (t \u2212 t(cid:48)) .\n\n(16)\n\nThis result can also be motivated by the following decomposition of the sinc kernel:\n\n(cid:20)cos 2\u03c0\u03be0t\n\n(cid:21)(cid:62)(cid:20)\u03c32 sinc(\u2206(t \u2212 t(cid:48)))\n\nsin 2\u03c0\u03be0t\n\n0\n\nSK (t \u2212 t(cid:48)) =\n\n(cid:21)(cid:20)cos 2\u03c0\u03be0t(cid:48)\n\nsin 2\u03c0\u03be0t(cid:48)\n\n(cid:21)\n\n.\n\n(17)\n\n0\n\n\u03c32 sinc(\u2206(t \u2212 t(cid:48)))\n\n2This follows directly from the identity cos(\u03b11 \u2212 \u03b12) = cos(\u03b11) cos(\u03b12) + sin(\u03b11) sin(\u03b12) choosing\n\n\u03b1i = 2\u03c0\u03be0ti for i = 1, 2\n\n5\n\n\fThe above matrix can be interpreted as the covariance of a multioutput GP [16, 1], where the two\nchannels x1, x2 are independent due to the block-diagonal structure. Then, the trajectories of the non-\ncentred sinc kernel can be simulated by: (i) sampling the two channels in this MOGP, (ii) multiplying\none of them by a sine and the other one by a cosine, to \ufb01nally (iii) summing them together.\nThe outlined relationship between centred and non-centred sinc trajectories is of particular interest in\nstereo modulation/demodulation [18] applications from a Bayesian nonparametric perspective. This\nis because we can identify the two independent draws from the centred sinc kernel as lower frequency\nsignals containing information (such as stereo audio, bivariate sensors, or two-subject sensors) and\nthe deterministic higher frequency sine and cosine signals as a carrier. In this setting, since the paths\nof a GP-sinc are equal (in probability) to those of the factor model presented in eq. (15), we can\nconsider the GP-sinc as a generative model for stereo amplitude modulation.\nRecall that the very objective in stereo demodulation is to recover the latent information signals,\nhenceforth referred to as channels, at the receiver\u2019s end from (possibly corrupted) observations.\nIn this regard, the sinc kernel represents a unique contribution, since Bayesian signal recovery\nunder noisy/missing observations is naturally handled by GP models. In simple terms, for a stereo\nmodulated signal with carrier frequency \u03be0 and bandwidth \u2206, the posterior over channels {xi}i=1,2\nwrt an observation x (of the modulated signal) is jointly Gaussian and given by\n\nxi,x(t)\u039b\u22121x, Kxi(t \u2212 t(cid:48)) \u2212 K(cid:62)\n\nxi,x(t)\u039b\u22121Kx,xi(t(cid:48))),\n\np(xi(t)|x) = GP(K(cid:62)\n\n(18)\nnoise is the covariance of the observations, Kxi(t\u2212t(cid:48)) is the prior covariance\nwhere \u039b = SK (t, t)+I\u03c32\nof channel xi(t), and Kxi,x(t) is the covariance between observations x and channel xi(t) given by\n(19)\n\nKxi,x(t) = E [xi(t)x(t)] = \u03c32 sinc(\u03b4(t \u2212 t)) cos(2\u03c0\u03be0t),\n\nwhere we have used the same notation as eq. (12).\nFig. 2 shows an illustrative implementation of GP-sinc demodulation, where the associated channels\nwere recovered form non-uniform observations of a sinc-GP trajectory.\n\nFigure 2: Demodulation using the sinc kernel. Left: A draw from a GP with noncentred sinc kernel\n(information \u201ctimes\u201d carrier). Right: Posterior of the stereo channels with latent modulated signal in\nlight grey.\n\n5 Bayesian band-pass \ufb01ltering with GP-sinc\n\nIn signal processing, the extraction of a frequency-speci\ufb01c part of a signal is referred to as band-\npass \ufb01ltering [9]; accordingly, low-pass and high-pass \ufb01ltering refer to extracting the low (centred\naround zero) and high frequency components respectively. We next show that the sinc kernel in\neq. (5) has appealing features to address band-pass \ufb01ltering from a Bayesian standpoint, that is, to\n\ufb01nd the posterior distribution of a frequency-speci\ufb01c component conditional to noisy and missing\nobservations. For the speci\ufb01c low-pass \ufb01ltering seeting, see [30].\nWe formulate the \ufb01ltering setting as follows. Let us consider a signal given by the mixture\n\nx(t) = xband(t) + xelse(t),\n\n(20)\nwhere xband and xelse correspond to independent GPs only containing energy at frequencies inside and\noutside the band of interest respectively. Then, we can denote the PSDs of x(t) by S(\u03be) and those of\nthe components by Sband(\u03be) and Selse(\u03be) respectively. Therefore, our assumptions of independence\nof the components xband(t) and xelse(t) results on S(\u03be) = Sband(\u03be) + Selse(\u03be), where Sband(\u03be) and\nSelse(\u03be) have non-overlapping, or disjoint, support. An illustration of these PSDs is shown in Fig. 3.\n\n6\n\n020406080100time202Sample from GP with noncentred sinc kernelobservationsmodulated signal020406080100time202Posterior distributions over both channelschannel 1channel 2\f\u2212b\n\n\u2212a\na\nfrequency\n\n0\n\nb\n\nFigure 3: Illustration of PSDs in the band-pass \ufb01ltering setting:\nThe area inside the black line is the PSD of the process x, whereas\nthe regions in blue and red denote the PSDs of the band compo-\nnent xband (Sband) and frequencies outside the band xelse (Selse)\nrespectively. Choosing a = 0 recovers the low-pass setting.\n\nNotice that the above framework is suf\ufb01ciently general in the sense we only require that there is\na part of the signal on which we are interested, namely xband(t), and the rest. Critically, we have\nnot imposed any requirements on the kernel of the complete signal x. Due to the joint Gaussianity\nof x and xband, the Bayesian estimate of the band-pass \ufb01ltering problem, conditional to a set of\nobservations x, is given by a GP posterior distribution, the statistics of which will be given by the\ncovariances of x and xband. Since Sband can be expressed as the PSD of x times the symmetric\nrectangle introduced in eq. (4), we can observe that the covariance of xband is given by the generalised\nsinc kernel presented in eq. (9) and, therefore, it can be computed via the inverse Fourier transform:\n(21)\nwhere K(t) denotes the covariance kernel of x. Recall that this expression can be computed relying\non the Riemann-sum approximation for the convolution presented in Sec. 2.3. Then, the marginal\ncovariance of xband can be computed from the assumption of independence3\n\nKband(t) = F\u22121 {Sband(\u03be)} = F\u22121 {S(\u03be) simrecta,b (\u03be)} = K(t) (cid:63) SK (t) ,\n\nV [x(t), xband(t(cid:48))] = E [xband(t)xband(t(cid:48))] +\u0018\u0018\u0018\u0018\u0018\u0018\u0018\u0018\u0018: 0\n\nE [xelse(t)xband(t(cid:48))] = Kband(t \u2212 t(cid:48)).\n\n(22)\n\nIn realistic \ufb01ltering scenarios we only have access to noisy observations y = [y1, . . . , yn] at times t =\n[t1, . . . , tn]. Assuming a white and Gaussian observation noise with variance \u03c32\nnoise and independent\nfrom x, the posterior of xband is given by\np(xband(t)|y) = GP(Kband(t \u2212 t)\u039b\u22121y, Kband(t \u2212 t(cid:48)) \u2212 Kband(t \u2212 t)(cid:62)\u039b\u22121Kband(t(cid:48) \u2212 t)),\nwhere \u039b = K(t, t) + \u03c32\nK(t) (cid:63) SK (t) from eq. (21)\nTo conclude this section, notice that the proposed sinc-kernel-based Bayesian approach to band-pass\n\ufb01ltering is consistent with the classical practice. In fact, if no statistical knowledge of the process\nwere available for x, we can simply assume that the process is uncorrelated and the observations are\nnoiseless. This is equivalent to setting K(t) = \u03b40(t), \u039b = I, and Kband(t) = SK (t), therefore, we\nrecover the \u201cbrick-wall\u201d [18] \ufb01lter:\n\n(23)\nnoiseI is the covariance of the observations and recall that Kband(t) =\n\nn(cid:88)\n\n\u02c6xband(t) =\n\nsinc \u2206(t \u2212 ti) cos 2\u03c0\u03be0(t \u2212 ti)yi.\n\n(24)\n\n6 Experiments\n\ni=1\n\nWe validated the ability of the proposed sinc kernel to address, in probabilistic terms, the problems of\n(i) band-limited reconstruction, (ii) demodulation and (iii) band-pass \ufb01ltering using real-world data.\nAll examples included unevenly-sampled observations.\n\n6.1 Reconstruction of a band-limited audio signal\n\nWe considered an audio recording from the TIMIT repository [6]. The signal, originally sampled at\n16kHz, was low-pass \ufb01ltered using a brick-wall \ufb01lter at 750Hz. We focused on the reconstruction\nsetting using only 200 (out of 1000) observations with added Gaussian noise of standard deviation\nequal to a 10% of that of the audio signal. Fig. 4 shows the PSDs of the true and GP-sinc reconstructed\nsignals (mean and sample trajectories), where it can be seen that the proposed reconstruction follows\nfaithfully the spectral content of the original signal, i.e., it does not introduce unwanted frequency\ncomponents.\n\n3We can extend this model and assume that xband and xelse are correlated, this is direct from the MOGP\n\nliterature that designs covariance functions between GPs.\n\n7\n\n\fFigure 4: Band-limited reconstruction using GP-sinc: PSDs (left) and temporal reconstruction (right)\n\nFor comparison, we also reconstructed the band-limited audio signal with a GP with spectral mixture\nkernel (GP-SM) and a cubic spline. Fig. 5 shows the PSDs of the complete signal in red and those of\nthe reconstructions in blue for GP-SM (left) and the cubic spline (right). Notice how the proposed\nGP-sinc (Fig. 4, left) outperformed GP-SM and the spline due to its rectangular PSD, which allows\nfrequencies with high and zero energies to be arbitrarily close, unlike that of GP-SM that does not\nallow for a PSD with sharp decay.\n\nFigure 5: Reconstruction of a band-limited audio signal using GP-SM (left) and cubic spline (right).\nGround truth PSD is shown in red and reconstructions in blue.\n\n6.2 Demodulation of two heart-rate signals\n\nWe considered two heart-rate signals from the MIT-BIH Database [7], upsampled from 2Hz to 10Hz,\ncorresponding to two different subjects, which can thus be understood as statistically independent.\nWe then composed a stereo modulated signal using carrier of frequency 2Hz (most of the power of\nthe heart-rate signals is contained below 1Hz), and used a subset of 1200 (out of 9000) observations\nsamples with added noise of standard deviation equal to a 20% of that of the modulated signal. Fig. 6\nshows the 35-run 10-90 percentiles for the reconstruction error for both channels versus the average\nsampling frequency (recall that these are unevenly-sampled series), and the temporal reconstruction\nfor sampling frequency equal to 0.167. Notice how the reconstruction of the channels reaches a\nplateau for frequencies greater than 0.06, suggesting that oversampling does not improve performance\nas suggested by Proposition 3. The discrepancy in reconstruction error stems from the richer spectrum\nof channel 1.\n\nFigure 6: Heart-rate demodulation using GP-sinc: error (left) and reconstruction (right).\n\n6.3 Band-pass \ufb01ltering of CO2 concentration\n\nWe implemented GP-sinc for extracting the 1-year periodicity component of the well-known Mauna-\nLoa monthly CO2 concentration series. We used 200 (out of 727) observations, that is, an average\nsampling rate of 0.275[month\u22121] \u2248 3.3[year\u22121], which is above the Nyquist frequency for the\ndesired component. Fig. 7 shows both the un\ufb01ltered and the GP-sinc \ufb01ltered PSDs (left), and the\nlatent signal, observation and band-pass version using GP-sinc with \u03be0 = [year\u22121] and \u2206 = 0.1.\n\n8\n\n0500100015002000frequency [Hz]1010108106PSD [V**2/Hz]PSDs: true and reconstructedground truthGP-sinc0.01000.01250.01500.01750.02000.02250.02500.02750.0300time [seconds]0.040.020.000.020.04Temporal reconstruction: mean and 5 samples. Error: 6.71%ground truthGP-sincobservations0500100015002000frequency [Hz]1010108106PSD [V**2/Hz]PSDs: true and reconstructedground truthspectral mixture0.01000.01250.01500.01750.02000.02250.02500.02750.0300time [seconds]0.040.020.000.020.04Temporal reconstruction: mean and 5 samples. Error: 24.471%ground truthspectral mixtureobservations0500100015002000frequency [Hz]1010108106PSD [V**2/Hz]PSDs: true and reconstructedground truthcubic spline0.01000.01250.01500.01750.02000.02250.02500.02750.0300time [seconds]0.040.020.000.020.04Temporal reconstruction: mean and 5 samples. Error: 37.901%ground truthcubic splineobservations0.040.060.080.100.120.140.16sampling frequency0.10.20.40.61.010-90 percentile error (35 runs)channel 1channel 20200400600800time [seconds]2010010De-modulated (solid) and true (shaded) signals, samp. frequency: 0.167\fNotice that, as desired, the GP-sinc band-pass \ufb01lter was able to recover the yearly component from\nnon-uniformly acquired observations.\n\nFigure 7: Bandpass \ufb01ltering of Mauna-Loa monthly CO2 concentration using GP-sinc.\n\n6.4 Generalised sinc kernel and Nyquist-based sparse implementation\n\nLastly, we implemented the generalised sinc kernel (GSK) in eq. (9), i.e., a sinc mixture, using a\nsparse approximation where inducing locations are chosen according to the Nyquist frequency\u2014see\nSec. 3. We trained a GP with the GSK kernel, using the heart rate signal from the MIT-BIH database\nwhere we simulated regions of missing data. Fig. 8 shows the PSD at the left (components in colours\nand GSK in red), the resulting sum-of-sincs kernel at the centre, and the time series (ground truth,\nobservations, and reconstruction) at the right. Notice from the right plot that though N = 600\nobservations were considered (black dots), only M = 54 inducing locations (blue crosses) were\nneeded since they are chosen based on the extension of the support of the (trained) PSD (Sec. 3).\n\nFigure 8: Implementation of generalised sinc kernel (sum of sincs) and Nyquist-based sparse\napproximation using a heart-rate signal. From left to right: PSDs (components in colour and sum in\nred), resulting GSK kernel and heart-rate signal.\n\n7 Discussion\n\nWe have proposed a novel stationary covariance kernel for Gaussian processes (GP), named the\nsinc kernel, that generates trajectories with band-limited spectrum. This has been achieved by\nparametrising the GP\u2019s power spectral density as a rectangular function, and then applying the\ninverse Fourier transform. In addition to its use on GP training and prediction, the properties of the\nproposed kernel have been illuminated in the light of the classical spectral representation framework.\nThis allowed us to interpret the role of the sinc kernel on in\ufb01nite mixtures of sinusoids, Nyquist\nreconstruction, stereo amplitude modulation and band-pass \ufb01ltering. From theoretical, illustrative and\nexperimental standpoints, we have validated both the novelty of the proposed approach as well as its\nconsistency with the mature literature in spectral estimation. Future research lines include exploiting\nthe features of the sinc kernel for sparse interdomain GP approximations [14] and spectral estimation\n[25], understanding the error reconstruction rates for the general kernels following the results of\nSection 3, and comparing general kernels via a mixture of sinc kernels as suggested in Section 2.3.\n\nAcknowledgments\n\nThis work was funded by the projects Conicyt-PIA #AFB170001 Center for Mathematical Modeling\nand Fondecyt-Iniciaci\u00f3n #11171165.\n\n9\n\n0123456frequency [1/year]104102100102104PSD [V**2/year]PSDs: unfiltered and GP-sinc filteredunfiltered (latent) signalfiltered signal1960197019801990200020102020time [years]2502550Unfiltered and GP-sinc band-pass filtered signalunfiltered (latent) signalfiltered signalobservations0.000.020.040.060.08frequency [Hz]101102103104power spectral densitysum of rectangles10050050100time [seconds]010203040learnt covariance kernel050100150200250300350400time [seconds]20020GP-sinc reconstruction (in red)ground truthobspseudo-inputs\fReferences\n[1] M. A. \u00c1lvarez, L. Rosasco, and N. D. Lawrence. Kernels for vector-valued functions: A review.\n\nFound. Trends Mach. Learn., 4(3):195\u2013266, March 2012.\n\n[2] S. Bochner, M. Tenenbaum, and H. Pollard. Lectures on Fourier Integrals. Princeton University\n\nPress, 1959.\n\n[3] R. Boloix-Tortosa, J. J. Murillo-Fuentes, F. J. Pay\u00e1n-Somet, and F. P\u00e9rez-Cruz. Complex\nIEEE Transactions on Neural Networks and Learning\n\nGaussian processes for regression.\nSystems, 29(11):5499\u20135511, 2018.\n\n[4] G. L. Bretthorst. Bayesian Spectrum Analysis and Parameter Estimation. Lecture Notes in\n\nStatistics. Springer, 1988.\n\n[5] Y. Gal and R. Turner. Improving the Gaussian process sparse spectrum approximation by\n\nrepresenting uncertainty in frequency inputs. In Proc. of ICML, pages 655\u2013664, 2015.\n\n[6] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett. DARPA TIMIT\nacoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon\ntechnical report n, 93, 1993.\n\n[7] A. L. Goldberger and D. R. Rigney. Theory of Heart: Biomechanics, Biophysics, and Nonlinear\nDynamics of Cardiac Function, chapter Nonlinear dynamics at the bedside, pages 583\u2013605.\nSpringer-Verlag, 1991.\n\n[8] P. C. Gregory. A Bayesian revolution in spectral analysis. AIP Conference Proceedings,\n\n568(1):557\u2013568, 2001.\n\n[9] S. S. Haykin. Adaptive \ufb01lter theory. Pearson Education India, 2008.\n\n[10] J. Hensman, N. Durrande, and A. Solin. Variational fourier features for Gaussian processes.\n\nJournal of Machine Learning Research, 18(151):1\u201352, 2018.\n\n[11] E. T. Jaynes. Bayesian spectrum and chirp analysis.\n\nIn Maximum-Entropy and Bayesian\n\nSpectral Analysis and Estimation Problems, pages 1\u201337. Springer, 1987.\n\n[12] E. T. Jaynes. Probability Theory: The Logic of Science. Cambrdige University Press, 2003.\n\n[13] S. Kay. Modern Spectral Estimation: Theory and Application. Prentice Hall, 1988.\n\n[14] M. L\u00e1zaro-Gredilla and A. Figueiras-Vidal. Inter-domain gaussian processes for sparse inference\nusing inducing features. In Advances in Neural Information Processing Systems 22, pages\n1087\u20131095. Curran Associates, Inc., 2009.\n\n[15] M. L\u00e1zaro-Gredilla, J. Qui\u00f1onero Candela, C. E. Rasmussen, and A. R. Figueiras-Vidal. Sparse\nspectrum Gaussian process regression. Journal of Machine Learning Research, 11(Jun):1865\u2013\n1881, 2010.\n\n[16] A. Melkumyan and F. Ramos. Multi-kernel Gaussian processes. In Proc. of IJCAI, pages\n\n1408\u20131413. AAAI Press, 2011.\n\n[17] H. Nyquist. Certain topics in telegraph transmission theory. Transactions of the American\n\nInstitute of Electrical Engineers, 47(2):617\u2013644, 1928.\n\n[18] A. V. Oppenheim, Alan S. Willsky, and S. Hamid. Signals and Systems. Pearson, 1996.\n\n[19] G. Parra and F. Tobar. Spectral mixture kernels for multi-output Gaussian processes.\n\nIn\nAdvances in Neural Information Processing Systems 30, pages 6681\u20136690. Curran Associates,\nInc., 2017.\n\n[20] M. J. D. Powell. An ef\ufb01cient method for \ufb01nding the minimum of a function of several variables\n\nwithout calculating derivatives. The Computer Journal, 7(2):155\u2013162, 1964.\n\n[21] C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. The MIT Press,\n\n2006.\n\n10\n\n\f[22] S. Remes, M. Heinonen, and S. Kaski. Non-stationary spectral kernels. In Advances in Neural\n\nInformation Processing Systems 30, pages 4642\u20134651. Curran Associates, Inc., 2017.\n\n[23] C. E. Shannon. Communication in the presence of noise. Proceedings of the Institute of Radio\n\nEngineers, 37(1):10\u201321, 1949.\n\n[24] P. Stoica and R. L. Moses. Spectral analysis of signals. Pearson Prentice Hall Upper Saddle\n\nRiver, NJ, 2005.\n\n[25] F. Tobar. Bayesian nonparametric spectral estimation. In Advances in Neural Information\n\nProcessing Systems 31, pages 10148\u201310158, 2018.\n\n[26] F. Tobar, T. Bui, and R. Turner. Design of covariance functions using inter-domain inducing\n\nvariables. In NIPS 2015 - Time Series Workshop, 2015.\n\n[27] F. Tobar, T. Bui, and R. Turner. Learning stationary time series using Gaussian processes\nwith nonparametric kernels. In Advances in Neural Information Processing Systems 28, pages\n3501\u20133509. Curran Associates, Inc., 2015.\n\n[28] F. Tobar and R. Turner. Modelling of complex signals using Gaussian processes. In Proc. of\n\nIEEE ICASSP, pages 2209\u20132213, 2015.\n\n[29] K. R. Ulrich, D. E. Carlson, K. Dzirasa, and L. Carin. GP kernels for cross-spectrum analysis. In\nAdvances in Neural Information Processing Systems 28, pages 1999\u20132007. Curran Associates,\nInc., 2015.\n\n[30] C. Valenzuela and F. Tobar. Low-pass \ufb01ltering as bayesian inference. In Proc. of IEEE ICASSP,\n\npages 3367\u20133371, 2019.\n\n[31] E. T. Whittaker. On the functions which are represented by the expansions of the interpolation-\n\ntheory. Proceedings of the Royal Society of Edinburgh, 35:181\u2013194, 1915.\n\n[32] A. G. Wilson and R. P. Adams. Gaussian process kernels for pattern discovery and extrapolation.\n\nIn Proc. of ICML, pages 1067\u20131075, 2013.\n\n[33] S. J. Wright and J. Nocedal. Numerical optimization. Springer Science, 35(67-68):7, 1999.\n\n11\n\n\f", "award": [], "sourceid": 6935, "authors": [{"given_name": "Felipe", "family_name": "Tobar", "institution": "Universidad de Chile"}]}