{"title": "Adding Constrained Discontinuities to Gaussian Process Models of Wind Fields", "book": "Advances in Neural Information Processing Systems", "page_first": 861, "page_last": 867, "abstract": null, "full_text": "Adding Constrained Discontinuities to Gaussian \n\nProcess Models of Wind Fields \n\nDan Cornford* \n\nIan T. Nabney \n\nChristopher K. I. Williamst \n\nNeural Computing Research Group \n\nAston University, BIRMINGHAM, B4 7ET, UK \n\nd.comford@aston.ac.uk \n\nAbstract \n\nGaussian Processes provide good prior models for  spatial data,  but can \nbe  too  smooth. \nIn  many  physical  situations  there  are  discontinuities \nalong bounding surfaces, for example fronts in near-surface wind fields. \nWe  describe  a  modelling  method  for  such  a  constrained  discontinuity \nand demonstrate how to  infer the  model parameters in wind fields  with \nMCMC sampling. \n\n1 \n\nINTRODUCTION \n\nWe introduce a model for wind fields based on Gaussian Processes (GPs) with  'constrained \ndiscontinuities'.  GPs  provide a  flexible  framework for  modelling various  systems.  They \nhave been adopted in  the neural network community and are interpreted as  placing priors \nover functions. \n\nStationary vector-valued GP models (Daley,  1991) can produce realistic  wind fields  when \nrun as a generative model; however, the resulting wind fields  do not contain some features \ntypical of the atmosphere.  The most difficult features to include are surface fronts.  Fronts \nare generated by complex atmospheric dynamics and  are  marked by  large changes in  the \nsurface  wind  direction  (see  for  example Figures  2a  and  3b)  and  temperature.  In  order \nto account for such features, which appear discontinuous at our observation scale, we have \ndeveloped a model for vector-valued GPs with constrained discontinuities which could also \nbe applied to surface reconstruction in computer vision, and geostatistics. \n\nIn  section  2 we illustrate the generative model for  wind fields  with fronts.  Section  3 ex(cid:173)\nplains what we mean by GPs with constrained discontinuities and derives the likelihood of \ndata under the model.  Results  of Bayesian estimation of the model parameters are given, \n\n\u00b7To whom correspondence should be addressed. \ntNowat:  Division of Informatics, University of Edinburgh,  5 Forrest Hill, Edinburgh EHI  2QL, \n\nScotland, UK \n\n\f862 \n\nD. Com/ord,  I.  T.  Nabney and C.  K.  I.  Williams \n\nusing a Markov Chain Monte Carlo (MCMC) procedure. In the final  section, the strengths \nand weaknesses of the model are discussed and improvements suggested. \n\n2  A GENERATIVE WIND FIELD MODEL \n\nWe  are primarily interested in retrieving wind fields  from  satellite scatterometer observa(cid:173)\ntions  of the  ocean  surface!.  A  probabilistic  prior model  for  wind fields  will  be  used  in \na  Bayesian  procedure to resolve ambiguities  in local  predictions of wind direction.  The \ngenerative model  for a  wind  field  including  a  front  is taken  to  be a  combination  of two \nvector-valued GPs with a constrained discontinuity. \n\nA common method for representing wind fields is to put GP priors over the velocity poten(cid:173)\ntial  ~ and stream function 'It, assuming the processes are uncorrelated (Daley,  1991).  The \nhorizontal wind vector u  =  (u, v) can then be derived from: \n\n8'lt  8~ \nu=--+-, \n8y \n8x \n\n(1) \n\nThis  produces  good  prior models  for  wind  fields  when  a  suitable  choice  of covariance \nfunction  for  ~ and  'It  is  made.  We  have  investigated  using  a  modified  Bessel  function \nbased covariance2  (Handcock and Wallis,  1994) but found, using three years of wind data \nfor the North Atlantic, that the maximum a posteriori value for the smoothness paramete~ \nin this covariance function was'\" 2.5.  Thus we used the correlation function: \n\n(2) \n\np(r)  =  (1 + .!:..  + ~) exp (-.!:..) \n\n3L2 \n\nL \n\nL \n\nwhere L is the correlation length scale, which is equivalent to the modified Bessel function \nand less computationally demanding (Cornford, 1998). \n\nSimulate Frontal Position. Orientation and Direction \n\nN \n\nSimulate Along Both Sides of Front using GPl \n\nSimulate 'Mnd Raids Either Side of Front Conditionally \n\non that Sides Frontal 'Mnds using GP2 \n\nOrigin \n\n(a) \n\n(b) \n\nFigure 1:  (a) Flowchart describing the generative frontal model.  See text for full  descrip(cid:173)\ntion.  (b) A description of the frontal model. \n\nThe generative model has the form outlined in Figure  1 a.  Initially the frontal  position and \norientation  are simulated.  They are defined by  the angle clockwise from  north (\u00a2/) that \nthe front makes and a point on the line (x/, Y /).  Having defined the position of the front, \n\nlS~  http://www.ncrg.aston.ac.uk/Projects/NEUROSAT/NEUROSAT.htm1 \nfor  details  of the  scatterometer  work.  Technical  reports  describing,  in  more  detail,  methods  for \ngenerating prior wind field  models can also be accessed from the same page. \n\n2The modified Bessel function  allows us to control the differentiability of the sample realisations \n\nthrough the 'smoothness parameter', as well as the length scales and variances. \n\n3This varies with season, but is the most temporally stable parameter in the covariance function. \n\n\fAdding Constrained Discontinuities to  GP Models o/Wind Fields \n\n863 \n\nthe angle of the wind across the front  (a J)  is  simulated from  a  distribution  covering the \nrange [0,71\").  This angle is related to the vertical component of vorticity \u00ab() across the front \nthrough (  =  k\u00b7 V  x u  ex:  cos (\u00a5 ) and the constraint a J  E  [0,71\")  ensures cyclonic vorticity \nat the front.  It is assumed that the front bisects a J.  The wind speed (8 J) is then simulated at \nthe front.  Since there is generally little change in  wind speed across the front, one value is \nsimulated for both sides of the front.  These components 8 f  =  (\u00a2 J , x J , Y J, a J, 8 J) define \nthe  line  of the  front  and  the  mean  wind  vectors just ahead  of and just behind the  front \n(Figure  Ib): \n\nA realistic model requires some variability in wind vectors along the front.  Thus we use a \nGP with a non-zero mean (mla or mlb) along the line of the front.  In  the real atmosphere \nwe observe a smaller variability in the  wind vectors along the line of the front compared \nwith regions away from fronts . Thus we use different GP parameters along the front (G Pl ), \nfrom  those  used  in  the  wind  field  away  from  the  front  (GP2 ),  although  the  same  GPl \nparameters are used both sides of the front, just with different means. The winds just ahead \nof and behind the front are assumed conditionally independent given  ml a  and  mlb, and \nare  simulated at  a  regular  50 km  spacing.  The  final  step  in  the  generative model  is  to \nsimulate wind vectors using G P2  in both regions either side of the front,  conditionally on \nthe values along that side of the front.  This model is flexible enough to represent fronts, yet \nhas the required constraints derived from meteorological principles, for example that fronts \nshould always be associated  with  cyclonic  vorticity and that discontinuities  at the model \nscale should be in  wind direction but not in wind speed4 .  To  make this  generative model \nuseful for inference, we need to be able to compute the data likelihood, which is the subject \nof the next section. \n\n3  GPs WITH CONSTRAINED DISCONTINUITIES \n\n\"  . ; .... \n\n-]. \n1 \n! .. \nI \n\n> \n\nD2 \n\nDl \n\n(a) \n\n(b) \n\nFigure 2:  (a) The discontinuity in one ofthe vector components in a simulation. (b) Frame(cid:173)\nwork for GPs with boundary conditions.  The curve Dl has nl sample points with values \nZt. The domain D2  has n2  points with values Z2. \n\n4The model allows small discontinuities in wind speed, which are consistent with frontal dynam-\n\nics. \n\n\f864 \n\nD. Cornford, 1.  T  Nabney and C.  K.  1.  Williams \n\nWe  consider data from  two domains  D1  and  D2  (Figure 2b),  where in  this  case  D1  is  a \ncurve in  the plane which is intended to  be the  front and  D2  is  a region of the plane.  We \nobtain n1  variables  Zl at points  Xl  along the  curve,  and  we  assume these are  generated \nunder G P1  (a GP which depends on parameters 81 and has mean m1  =  m1l which will be \ndetermined by  (3) or (4\u00bb.  We are interested in determining the likelihood of the variables \nZ2 observed at n2  points X2  under GP2 which depends on parameters 82, conditioned on \nthe  'constrained discontinuities' at the front. \n\nWe  evaluate this  by calculating the likelihood of Z2  conditioned on  the  n1  values of Zl \nfrom G P1  along the front and marginalising out Zl: \n\np(Z2182,81) = i: p(Z2I Z 1,82,81,m1)p(ZlI81,m1) dZ1. \n\nFrom the definition ofthe likelihood of a GP (Cressie,  1993) we find: \n\n(5) \n\n(6) \n\np(Z2IZ1,82,81,m1) = \n\nwhere: \n\n~  1  exp (--21 Z;'S2;lZ;) \n\n(271\")  2  ISd'2 \n\nTo  understand the  notation consider the joint distribution  of Zl, Z2  and in  particular its \ncovariance matrix: \n\n(7) \n\nwhere  K 1112  is  the  n1  x  n1  covariance matrix between the points in  D1  evaluated using \n8 2, K1212  =  K~112 the n1  x  n2 (cross) covariance matrix between the points in D1  and D2 \nevaluated using 8 2  and K2212  is the usual  n2  x  n2  covariance for points in  D2.  Thus we \ncan  see that S22  is  the n2  x  n2  modified covariance for the points in  D2  given the points \nalong D 1 ,  while the Z; is the corrected mean that accounts for the values at the points in \nD 1 \u2022 which have non-zero mean. \n\nWe  remove  the  dependency  on  the  values  Zl  by  evaluating  the  integral  in  (5). \np(ZlI81, m1) is given by: \n\np(ZlI81, m1) = \n\n(271\") \n\n~ 1 \nIK111112 \n\n1.  exp (--21 (Zl - m1)' Kill1 (Zl  - m 1\u00bb) \n\n(8) \n\nwhere  K 1111  is  the  n1  x  n1  covariance matrix between the points in  D1  evaluated under \nthe covariance given by 8 1 .  Completing the square in  Zl in the exponent, the integral (5) \ncan be evaluated to give: \n(z  188m ) -\np \n\n(271\")~ IS221 t  IK11111t  IBlt \n\n1  _1_ \n\n_1_ x \n\n2  2,  1, \n\n(9) \n\n1 \n\n1 \n\n-\n\nexp (~ (C' B-1C - Z2' S2;l Z2 - m1' Kill1 m1) ) \n\nwhere: \n\nB \n\nC' \n\n1112 \n\n1212 \n\n(K'  K-1  )'S-lK'  K- 1  K- 1 \n1111 \nZ  'S-lK'  K- 1 \n\n1212  1112 + \n'K- 1 \n1111 \n\n22 \n1112  + m1 \n\n2  22 \n\n1212 \n\nThe algorithm has been coded in MATLAB and can deal with reasonably large numbers of \npoints quickly. For a two dimensional vector-valued GP with n1  =  12 and n2  =  200 5  and \n\n5This is equivalent to nl =  24 and n2  =  400 for a scalar GP. \n\n\fAdding Constrained Discontinuities to GP Models of Wind Fields \n\n865 \n\na covariance function given by (2), computation of the log likelihood takes 4.13 seconds on \nan  SGI Indy R5000. \n\nThe mean value just ahead and behind the front define the mean values for the constrained \ndiscontinuity (i.e.  m1 in (9\u00bb. Conditional on the frontal parameters the wind fields either \nside (Figure 3a) are assumed independent: \n\np(Z2a, Z2b\\02, 01, Of) =  p(Z2a\\02, 01, m1a)p(m1a\\Of)  x \n\np(Z2b\\02, 01, m1b)p(m1b\\Of) \n\nwhere we have performed the integration (5) to remove the dependency on Z1a  and Z1b. \nThus the likelihood of the data Z2  =  (Z2a, Z2b)  given the model parameters O2,01, Of \nis simply the product of the likelihoods of two GPs with a constrained discontinuity which \ncan be computed using (9). \n\n-von  \" ,  ....  , ' - - - -\n\n, \n\n-\n\nSOIl  __  - . . .\"\"\"\" , , - (cid:173)\n\n.............. ,\"  ---\n---' ...... ,\"---\n' \n-\n,,,'\\--_ .... , '  \n,  ''I. \\  ,  -- - - -\n\\, \"\" _-..... .... \", , \n\n'\\  \\,  \\,  - - - ..... , , ,  , \n\n,  - - -\n\n....  , \n\n' \"  _--....'''''' \" \n\nI \n\n, \n\n\"\" \n\n_II  :::: \n.-\n,!.  100 \n\n\"DC \n\nFront \n\n(a) \n\n(b) \n\nFigure 3:  (a) The division of the wind field  using the generative frontal model.  Z1a, Z1b \nare  the  wind  fields  just ahead  and  behind  the  front,  along  its  length,  respectively.  Z2a, \nZ2b  are the  wind fields in the regions ahead of and behind the front respectively.  (b) An \nexample from the generative frontal model:  the wind field looks like a typical  'cold front'. \n\nThe model outlined above was tested on simulated data generated from the model to assess \nparameter sensitivity.  We  generated a wind  field  ZO  =  (Z2a' Z2b)  using  known  model \nparameters  (e.g.  Figure 3b).  We  then  sampled the  model  parameters from  the  posterior \ndistribution: \n\n(10) \n\nwhere p( ( 2), p( ( 1), p( Of) are prior distributions over the parameters in the GPs and front \nmodels.  This brings out one advantage of the proposed model.  All the model parameters \nhave  a  physical  interpretation  and  thus  expert knowledge  was  used  to  set  priors  which \nproduce realistic wind fields.  We will also use (10) to help set (hyper)priors using real data \nin  Zoo \n\nMCMC using the Metropolis algorithm (Neal,  1993) is used to sample from (to) using the \nNETLAB6  library.  Convergence of the Markov chain is  currently assessed using visual in(cid:173)\nspection of the univariate sample paths since the generating parameters are known, although \nother diagnostics could be used (Cowles and Carlin,  1996).  We  find  that the  procedure is \ninsensitive to the initial value of the GP parameters, but that the parameters describing the \nlocation ofthe front (1/>\"  d,) need to be initialised 'close' to the correct values if the chain \nis to converge on a reasonable time-scale.  In the application some preliminary analysis of \nthe wind field would be necessary to identify possible fronts and thus set the initial param(cid:173)\neters to  'sensible'  values.  We intend to fit  a vector-valued GP without any discontinuities \n\n6Available from http://www.ncrg.aston.ac . uk/netlab/index. html. \n\n\f866 \n\nD.  Comjord, I. T.  Nabney and C.  K.  1.  Williams \n\n2 \n\n3 \n\nSample nurrber \n\n4  ' \n\n5 \n\u2022  In' \n\n2 \n\n3 \n\nSample number \n\n4 \n\nw 104 \n\n(a) \n\n(b) \n\nFigure  4:  Examples  from  the  Markov  chain  of the  posterior distribution  (10).  (a)  The \nenergy = negative log posterior probability.  Note that the energy when the chain was  ini(cid:173)\ntialised was 2789 and the first 27 values are outside the range of the y-axis.  (b) The angle \nof the front relative to north (\u00a2> I) ' \n\nand then measure the 'strain' or misfit of the locally predicted winds with the winds fitted \nby the GP. Lines of large 'strain' will be used to initialise the front parameters. \n\n3000 \n\n1000 \n\n2 \n\n3 \n\nsample number \n\n500  ~  ~-\n\n~~-an1.5~uw~2ww~2.~5~~3L-~3.5 \n\nAngle of wind (radians) \n\n(a) \n\n(b) \n\nFigure 5:  Examples from the Markov chain of the posterior distribution (10).  (a) The angle \nof the wind across the front (01 ).  (b) Histogram of the posterior distribution of 01 allowing \na 10000 iteration bum-in period. \n\nExamples of samples from the Markov chain from the simulated wind field  shown in Fig(cid:173)\nure 3a can be seen in Figures 4 and 5. Figure 4a shows that the energy level (= negative log \nposterior probability) falls  very rapidly to  near its  minimum value from  its  large starting \nvalue of 2789. In these plots the true parameters for the front were \u00a2> I  = 0.555,01 = 2.125 \nwhile the initial values were set at \u00a2>I  =  0.89,01  =  1.49. Other parameters were also in(cid:173)\ncorrectly set.  The Metropolis  algorithm  seems  to be able to  find  the minimum  and  then \nstays in it. \n\nFigure 4b and 5a show the Markov chains for \u00a2>I  and 0/ '  Both converge quickly to an  ap(cid:173)\nparently stationary distributions, which have mean values very close to the 'true' generating \nparameters. The histogram of the distribution of 01 is shown in Figure 5b. \n\n\fAdding Constrained Discontinuities to GP Models of Wind Fields \n\n867 \n\n4  DISCUSSION AND CONCLUSIONS \n\nSimulations  from  our  model  are  meteorologically  plausible  wind  fields  which  contain \nfronts.  It is  possible  similar  models  could  usefully  be  applied  to  other modelling prob(cid:173)\nlems where there are discontinuities with known properties. A method for the computation \nof the likelihood of data given two GP models, one with  non-zero mean on the boundary \nand  another in  the domain in  which the data is  observed,  has  been  given.  This allows us \nto perform inference on the parameters in the frontal model using a Bayesian approach of \nsampling from the posterior distribution using a MCMC algorithm. \n\nThere are several weaknesses in the model specifically for fronts, which could be improved \nwith  further work.  Real atmospheric fronts are not straight, thus the model would be im(cid:173)\nproved by allowing  'curved' fronts.  We  could represent the position of the front,  oriented \nalong the angle defined by \u00a2, using either another smooth GP,  B-splines or possibly poly(cid:173)\nnomials. \n\nCurrently the points along the line of the front are simulated at the mean observation spac(cid:173)\ning in the rest of the wind field  ('\" 50 km). Interesting questions remain about the (in-fill) \nasymptotics (Cressie,  1993) as the distance between the points along the front tends to zero. \nEmpirical evidence suggests that as long as the spacing along the front is  'much less'  than \nthe length scale of the GP along the front (which is typically'\" 1000 km) then the spacing \ndoes not significantly affect the results. \n\nAlthough we  currently use a  Metropolis algorithm for sampling from  the  Markov chain, \nthe derivative of (9) with respect to the GP parameters 81  and 8 2  could be computed ana(cid:173)\nlytically and used in a hybrid Monte Carlo procedure (Neal,  1993). \n\nThese improvements should  lead to  a  relatively  robust procedure for putting  priors  over \nwind fields which will be used with real data when retrieving wind vectors from scatterom(cid:173)\neter observations over the ocean. \n\nAcknowledgements \n\nThis work was partially supported by the European Union funded NEUROSAT programme \n(grant number ENV 4 CT96-0314) and also EPSRC grant GRlL03088 Combining Spatially \nDistributed Predictions from Neural Networks. \n\nReferences \n\nCornford,  D.  1998.  Flexible  Gaussian  Process  Wind  Field  Models.  Technical  Report \n\nNCRG/98/017, Neural Computing Research Group, Aston University, Aston Trian(cid:173)\ngle, Birmingham, UK. \n\nCowles,  M.  K.  and  B.  P.  Carlin  1996.  Markov-Chain  Monte-Carlo  Convergence \n\nDiagnostics-A Comparative Review.  Journal  of the American Statistical Associ(cid:173)\nation 91, 883-904. \n\nCressie, N.  A. C.  1993. Statistics for Spatial Data. New York:  John Wiley and Sons. \nDaley, R.  1991. Atmospheric Data Analysis. Cambridge: Cambridge University Press. \nHandcock,  M.  S.  and  J.  R.  Wallis  1994.  An  Approach  to  Statistical  Spatio-Temporal \n\nModelling  of Meteorological  Fields.  Journal  of the  American  Statistical Associa(cid:173)\ntion 89, 368-378. \n\nNeal, R.  M.  1993. Probabilistic Inference Using Markov Chain Monte Carlo Methods. \nTechnical  Report CRG-TR-93-1,  Department of Computer Science,  University  of \nToronto. URL: http://www.cs.utoronto.ca/ ... radford. \n\n\f", "award": [], "sourceid": 1502, "authors": [{"given_name": "Dan", "family_name": "Cornford", "institution": null}, {"given_name": "Ian", "family_name": "Nabney", "institution": null}, {"given_name": "Christopher", "family_name": "Williams", "institution": null}]}