{"title": "Adding Constrained Discontinuities to Gaussian Process Models of Wind Fields", "book": "Advances in Neural Information Processing Systems", "page_first": 861, "page_last": 867, "abstract": null, "full_text": "Adding Constrained Discontinuities to Gaussian \n\nProcess Models of Wind Fields \n\nDan Cornford* \n\nIan T. Nabney \n\nChristopher K. I. Williamst \n\nNeural Computing Research Group \n\nAston University, BIRMINGHAM, B4 7ET, UK \n\nd.comford@aston.ac.uk \n\nAbstract \n\nGaussian Processes provide good prior models for spatial data, but can \nbe too smooth. \nIn many physical situations there are discontinuities \nalong bounding surfaces, for example fronts in near-surface wind fields. \nWe describe a modelling method for such a constrained discontinuity \nand demonstrate how to infer the model parameters in wind fields with \nMCMC sampling. \n\n1 \n\nINTRODUCTION \n\nWe introduce a model for wind fields based on Gaussian Processes (GPs) with 'constrained \ndiscontinuities'. GPs provide a flexible framework for modelling various systems. They \nhave been adopted in the neural network community and are interpreted as placing priors \nover functions. \n\nStationary vector-valued GP models (Daley, 1991) can produce realistic wind fields when \nrun as a generative model; however, the resulting wind fields do not contain some features \ntypical of the atmosphere. The most difficult features to include are surface fronts. Fronts \nare generated by complex atmospheric dynamics and are marked by large changes in the \nsurface wind direction (see for example Figures 2a and 3b) and temperature. In order \nto account for such features, which appear discontinuous at our observation scale, we have \ndeveloped a model for vector-valued GPs with constrained discontinuities which could also \nbe applied to surface reconstruction in computer vision, and geostatistics. \n\nIn section 2 we illustrate the generative model for wind fields with fronts. Section 3 ex(cid:173)\nplains what we mean by GPs with constrained discontinuities and derives the likelihood of \ndata under the model. Results of Bayesian estimation of the model parameters are given, \n\n\u00b7To whom correspondence should be addressed. \ntNowat: Division of Informatics, University of Edinburgh, 5 Forrest Hill, Edinburgh EHI 2QL, \n\nScotland, UK \n\n\f862 \n\nD. Com/ord, I. T. Nabney and C. K. I. Williams \n\nusing a Markov Chain Monte Carlo (MCMC) procedure. In the final section, the strengths \nand weaknesses of the model are discussed and improvements suggested. \n\n2 A GENERATIVE WIND FIELD MODEL \n\nWe are primarily interested in retrieving wind fields from satellite scatterometer observa(cid:173)\ntions of the ocean surface!. A probabilistic prior model for wind fields will be used in \na Bayesian procedure to resolve ambiguities in local predictions of wind direction. The \ngenerative model for a wind field including a front is taken to be a combination of two \nvector-valued GPs with a constrained discontinuity. \n\nA common method for representing wind fields is to put GP priors over the velocity poten(cid:173)\ntial ~ and stream function 'It, assuming the processes are uncorrelated (Daley, 1991). The \nhorizontal wind vector u = (u, v) can then be derived from: \n\n8'lt 8~ \nu=--+-, \n8y \n8x \n\n(1) \n\nThis produces good prior models for wind fields when a suitable choice of covariance \nfunction for ~ and 'It is made. We have investigated using a modified Bessel function \nbased covariance2 (Handcock and Wallis, 1994) but found, using three years of wind data \nfor the North Atlantic, that the maximum a posteriori value for the smoothness paramete~ \nin this covariance function was'\" 2.5. Thus we used the correlation function: \n\n(2) \n\np(r) = (1 + .!:.. + ~) exp (-.!:..) \n\n3L2 \n\nL \n\nL \n\nwhere L is the correlation length scale, which is equivalent to the modified Bessel function \nand less computationally demanding (Cornford, 1998). \n\nSimulate Frontal Position. Orientation and Direction \n\nN \n\nSimulate Along Both Sides of Front using GPl \n\nSimulate 'Mnd Raids Either Side of Front Conditionally \n\non that Sides Frontal 'Mnds using GP2 \n\nOrigin \n\n(a) \n\n(b) \n\nFigure 1: (a) Flowchart describing the generative frontal model. See text for full descrip(cid:173)\ntion. (b) A description of the frontal model. \n\nThe generative model has the form outlined in Figure 1 a. Initially the frontal position and \norientation are simulated. They are defined by the angle clockwise from north (\u00a2/) that \nthe front makes and a point on the line (x/, Y /). Having defined the position of the front, \n\nlS~ http://www.ncrg.aston.ac.uk/Projects/NEUROSAT/NEUROSAT.htm1 \nfor details of the scatterometer work. Technical reports describing, in more detail, methods for \ngenerating prior wind field models can also be accessed from the same page. \n\n2The modified Bessel function allows us to control the differentiability of the sample realisations \n\nthrough the 'smoothness parameter', as well as the length scales and variances. \n\n3This varies with season, but is the most temporally stable parameter in the covariance function. \n\n\fAdding Constrained Discontinuities to GP Models o/Wind Fields \n\n863 \n\nthe angle of the wind across the front (a J) is simulated from a distribution covering the \nrange [0,71\"). This angle is related to the vertical component of vorticity \u00ab() across the front \nthrough ( = k\u00b7 V x u ex: cos (\u00a5 ) and the constraint a J E [0,71\") ensures cyclonic vorticity \nat the front. It is assumed that the front bisects a J. The wind speed (8 J) is then simulated at \nthe front. Since there is generally little change in wind speed across the front, one value is \nsimulated for both sides of the front. These components 8 f = (\u00a2 J , x J , Y J, a J, 8 J) define \nthe line of the front and the mean wind vectors just ahead of and just behind the front \n(Figure Ib): \n\nA realistic model requires some variability in wind vectors along the front. Thus we use a \nGP with a non-zero mean (mla or mlb) along the line of the front. In the real atmosphere \nwe observe a smaller variability in the wind vectors along the line of the front compared \nwith regions away from fronts . Thus we use different GP parameters along the front (G Pl ), \nfrom those used in the wind field away from the front (GP2 ), although the same GPl \nparameters are used both sides of the front, just with different means. The winds just ahead \nof and behind the front are assumed conditionally independent given ml a and mlb, and \nare simulated at a regular 50 km spacing. The final step in the generative model is to \nsimulate wind vectors using G P2 in both regions either side of the front, conditionally on \nthe values along that side of the front. This model is flexible enough to represent fronts, yet \nhas the required constraints derived from meteorological principles, for example that fronts \nshould always be associated with cyclonic vorticity and that discontinuities at the model \nscale should be in wind direction but not in wind speed4 . To make this generative model \nuseful for inference, we need to be able to compute the data likelihood, which is the subject \nof the next section. \n\n3 GPs WITH CONSTRAINED DISCONTINUITIES \n\n\" . ; .... \n\n-]. \n1 \n! .. \nI \n\n> \n\nD2 \n\nDl \n\n(a) \n\n(b) \n\nFigure 2: (a) The discontinuity in one ofthe vector components in a simulation. (b) Frame(cid:173)\nwork for GPs with boundary conditions. The curve Dl has nl sample points with values \nZt. The domain D2 has n2 points with values Z2. \n\n4The model allows small discontinuities in wind speed, which are consistent with frontal dynam-\n\nics. \n\n\f864 \n\nD. Cornford, 1. T Nabney and C. K. 1. Williams \n\nWe consider data from two domains D1 and D2 (Figure 2b), where in this case D1 is a \ncurve in the plane which is intended to be the front and D2 is a region of the plane. We \nobtain n1 variables Zl at points Xl along the curve, and we assume these are generated \nunder G P1 (a GP which depends on parameters 81 and has mean m1 = m1l which will be \ndetermined by (3) or (4\u00bb. We are interested in determining the likelihood of the variables \nZ2 observed at n2 points X2 under GP2 which depends on parameters 82, conditioned on \nthe 'constrained discontinuities' at the front. \n\nWe evaluate this by calculating the likelihood of Z2 conditioned on the n1 values of Zl \nfrom G P1 along the front and marginalising out Zl: \n\np(Z2182,81) = i: p(Z2I Z 1,82,81,m1)p(ZlI81,m1) dZ1. \n\nFrom the definition ofthe likelihood of a GP (Cressie, 1993) we find: \n\n(5) \n\n(6) \n\np(Z2IZ1,82,81,m1) = \n\nwhere: \n\n~ 1 exp (--21 Z;'S2;lZ;) \n\n(271\") 2 ISd'2 \n\nTo understand the notation consider the joint distribution of Zl, Z2 and in particular its \ncovariance matrix: \n\n(7) \n\nwhere K 1112 is the n1 x n1 covariance matrix between the points in D1 evaluated using \n8 2, K1212 = K~112 the n1 x n2 (cross) covariance matrix between the points in D1 and D2 \nevaluated using 8 2 and K2212 is the usual n2 x n2 covariance for points in D2. Thus we \ncan see that S22 is the n2 x n2 modified covariance for the points in D2 given the points \nalong D 1 , while the Z; is the corrected mean that accounts for the values at the points in \nD 1 \u2022 which have non-zero mean. \n\nWe remove the dependency on the values Zl by evaluating the integral in (5). \np(ZlI81, m1) is given by: \n\np(ZlI81, m1) = \n\n(271\") \n\n~ 1 \nIK111112 \n\n1. exp (--21 (Zl - m1)' Kill1 (Zl - m 1\u00bb) \n\n(8) \n\nwhere K 1111 is the n1 x n1 covariance matrix between the points in D1 evaluated under \nthe covariance given by 8 1 . Completing the square in Zl in the exponent, the integral (5) \ncan be evaluated to give: \n(z 188m ) -\np \n\n(271\")~ IS221 t IK11111t IBlt \n\n1 _1_ \n\n_1_ x \n\n2 2, 1, \n\n(9) \n\n1 \n\n1 \n\n-\n\nexp (~ (C' B-1C - Z2' S2;l Z2 - m1' Kill1 m1) ) \n\nwhere: \n\nB \n\nC' \n\n1112 \n\n1212 \n\n(K' K-1 )'S-lK' K- 1 K- 1 \n1111 \nZ 'S-lK' K- 1 \n\n1212 1112 + \n'K- 1 \n1111 \n\n22 \n1112 + m1 \n\n2 22 \n\n1212 \n\nThe algorithm has been coded in MATLAB and can deal with reasonably large numbers of \npoints quickly. For a two dimensional vector-valued GP with n1 = 12 and n2 = 200 5 and \n\n5This is equivalent to nl = 24 and n2 = 400 for a scalar GP. \n\n\fAdding Constrained Discontinuities to GP Models of Wind Fields \n\n865 \n\na covariance function given by (2), computation of the log likelihood takes 4.13 seconds on \nan SGI Indy R5000. \n\nThe mean value just ahead and behind the front define the mean values for the constrained \ndiscontinuity (i.e. m1 in (9\u00bb. Conditional on the frontal parameters the wind fields either \nside (Figure 3a) are assumed independent: \n\np(Z2a, Z2b\\02, 01, Of) = p(Z2a\\02, 01, m1a)p(m1a\\Of) x \n\np(Z2b\\02, 01, m1b)p(m1b\\Of) \n\nwhere we have performed the integration (5) to remove the dependency on Z1a and Z1b. \nThus the likelihood of the data Z2 = (Z2a, Z2b) given the model parameters O2,01, Of \nis simply the product of the likelihoods of two GPs with a constrained discontinuity which \ncan be computed using (9). \n\n-von \" , .... , ' - - - -\n\n, \n\n-\n\nSOIl __ - . . .\"\"\"\" , , - (cid:173)\n\n.............. ,\" ---\n---' ...... ,\"---\n' \n-\n,,,'\\--_ .... , ' \n, ''I. \\ , -- - - -\n\\, \"\" _-..... .... \", , \n\n'\\ \\, \\, - - - ..... , , , , \n\n, - - -\n\n.... , \n\n' \" _--....'''''' \" \n\nI \n\n, \n\n\"\" \n\n_II :::: \n.-\n,!. 100 \n\n\"DC \n\nFront \n\n(a) \n\n(b) \n\nFigure 3: (a) The division of the wind field using the generative frontal model. Z1a, Z1b \nare the wind fields just ahead and behind the front, along its length, respectively. Z2a, \nZ2b are the wind fields in the regions ahead of and behind the front respectively. (b) An \nexample from the generative frontal model: the wind field looks like a typical 'cold front'. \n\nThe model outlined above was tested on simulated data generated from the model to assess \nparameter sensitivity. We generated a wind field ZO = (Z2a' Z2b) using known model \nparameters (e.g. Figure 3b). We then sampled the model parameters from the posterior \ndistribution: \n\n(10) \n\nwhere p( ( 2), p( ( 1), p( Of) are prior distributions over the parameters in the GPs and front \nmodels. This brings out one advantage of the proposed model. All the model parameters \nhave a physical interpretation and thus expert knowledge was used to set priors which \nproduce realistic wind fields. We will also use (10) to help set (hyper)priors using real data \nin Zoo \n\nMCMC using the Metropolis algorithm (Neal, 1993) is used to sample from (to) using the \nNETLAB6 library. Convergence of the Markov chain is currently assessed using visual in(cid:173)\nspection of the univariate sample paths since the generating parameters are known, although \nother diagnostics could be used (Cowles and Carlin, 1996). We find that the procedure is \ninsensitive to the initial value of the GP parameters, but that the parameters describing the \nlocation ofthe front (1/>\" d,) need to be initialised 'close' to the correct values if the chain \nis to converge on a reasonable time-scale. In the application some preliminary analysis of \nthe wind field would be necessary to identify possible fronts and thus set the initial param(cid:173)\neters to 'sensible' values. We intend to fit a vector-valued GP without any discontinuities \n\n6Available from http://www.ncrg.aston.ac . uk/netlab/index. html. \n\n\f866 \n\nD. Comjord, I. T. Nabney and C. K. 1. Williams \n\n2 \n\n3 \n\nSample nurrber \n\n4 ' \n\n5 \n\u2022 In' \n\n2 \n\n3 \n\nSample number \n\n4 \n\nw 104 \n\n(a) \n\n(b) \n\nFigure 4: Examples from the Markov chain of the posterior distribution (10). (a) The \nenergy = negative log posterior probability. Note that the energy when the chain was ini(cid:173)\ntialised was 2789 and the first 27 values are outside the range of the y-axis. (b) The angle \nof the front relative to north (\u00a2> I) ' \n\nand then measure the 'strain' or misfit of the locally predicted winds with the winds fitted \nby the GP. Lines of large 'strain' will be used to initialise the front parameters. \n\n3000 \n\n1000 \n\n2 \n\n3 \n\nsample number \n\n500 ~ ~-\n\n~~-an1.5~uw~2ww~2.~5~~3L-~3.5 \n\nAngle of wind (radians) \n\n(a) \n\n(b) \n\nFigure 5: Examples from the Markov chain of the posterior distribution (10). (a) The angle \nof the wind across the front (01 ). (b) Histogram of the posterior distribution of 01 allowing \na 10000 iteration bum-in period. \n\nExamples of samples from the Markov chain from the simulated wind field shown in Fig(cid:173)\nure 3a can be seen in Figures 4 and 5. Figure 4a shows that the energy level (= negative log \nposterior probability) falls very rapidly to near its minimum value from its large starting \nvalue of 2789. In these plots the true parameters for the front were \u00a2> I = 0.555,01 = 2.125 \nwhile the initial values were set at \u00a2>I = 0.89,01 = 1.49. Other parameters were also in(cid:173)\ncorrectly set. The Metropolis algorithm seems to be able to find the minimum and then \nstays in it. \n\nFigure 4b and 5a show the Markov chains for \u00a2>I and 0/ ' Both converge quickly to an ap(cid:173)\nparently stationary distributions, which have mean values very close to the 'true' generating \nparameters. The histogram of the distribution of 01 is shown in Figure 5b. \n\n\fAdding Constrained Discontinuities to GP Models of Wind Fields \n\n867 \n\n4 DISCUSSION AND CONCLUSIONS \n\nSimulations from our model are meteorologically plausible wind fields which contain \nfronts. It is possible similar models could usefully be applied to other modelling prob(cid:173)\nlems where there are discontinuities with known properties. A method for the computation \nof the likelihood of data given two GP models, one with non-zero mean on the boundary \nand another in the domain in which the data is observed, has been given. This allows us \nto perform inference on the parameters in the frontal model using a Bayesian approach of \nsampling from the posterior distribution using a MCMC algorithm. \n\nThere are several weaknesses in the model specifically for fronts, which could be improved \nwith further work. Real atmospheric fronts are not straight, thus the model would be im(cid:173)\nproved by allowing 'curved' fronts. We could represent the position of the front, oriented \nalong the angle defined by \u00a2, using either another smooth GP, B-splines or possibly poly(cid:173)\nnomials. \n\nCurrently the points along the line of the front are simulated at the mean observation spac(cid:173)\ning in the rest of the wind field ('\" 50 km). Interesting questions remain about the (in-fill) \nasymptotics (Cressie, 1993) as the distance between the points along the front tends to zero. \nEmpirical evidence suggests that as long as the spacing along the front is 'much less' than \nthe length scale of the GP along the front (which is typically'\" 1000 km) then the spacing \ndoes not significantly affect the results. \n\nAlthough we currently use a Metropolis algorithm for sampling from the Markov chain, \nthe derivative of (9) with respect to the GP parameters 81 and 8 2 could be computed ana(cid:173)\nlytically and used in a hybrid Monte Carlo procedure (Neal, 1993). \n\nThese improvements should lead to a relatively robust procedure for putting priors over \nwind fields which will be used with real data when retrieving wind vectors from scatterom(cid:173)\neter observations over the ocean. \n\nAcknowledgements \n\nThis work was partially supported by the European Union funded NEUROSAT programme \n(grant number ENV 4 CT96-0314) and also EPSRC grant GRlL03088 Combining Spatially \nDistributed Predictions from Neural Networks. \n\nReferences \n\nCornford, D. 1998. Flexible Gaussian Process Wind Field Models. Technical Report \n\nNCRG/98/017, Neural Computing Research Group, Aston University, Aston Trian(cid:173)\ngle, Birmingham, UK. \n\nCowles, M. K. and B. P. Carlin 1996. Markov-Chain Monte-Carlo Convergence \n\nDiagnostics-A Comparative Review. Journal of the American Statistical Associ(cid:173)\nation 91, 883-904. \n\nCressie, N. A. C. 1993. Statistics for Spatial Data. New York: John Wiley and Sons. \nDaley, R. 1991. Atmospheric Data Analysis. Cambridge: Cambridge University Press. \nHandcock, M. S. and J. R. Wallis 1994. An Approach to Statistical Spatio-Temporal \n\nModelling of Meteorological Fields. Journal of the American Statistical Associa(cid:173)\ntion 89, 368-378. \n\nNeal, R. M. 1993. Probabilistic Inference Using Markov Chain Monte Carlo Methods. \nTechnical Report CRG-TR-93-1, Department of Computer Science, University of \nToronto. URL: http://www.cs.utoronto.ca/ ... radford. \n\n\f", "award": [], "sourceid": 1502, "authors": [{"given_name": "Dan", "family_name": "Cornford", "institution": null}, {"given_name": "Ian", "family_name": "Nabney", "institution": null}, {"given_name": "Christopher", "family_name": "Williams", "institution": null}]}