{"title": "The Unscented Particle Filter", "book": "Advances in Neural Information Processing Systems", "page_first": 584, "page_last": 590, "abstract": null, "full_text": "The Unscented Particle  Filter \n\nRudolph van  der  Merwe \nOregon Graduate Institute \n\nElectrical and Computer Engineering \n\nP.O.  Box 91000,Portland,OR 97006,  USA \n\nrvdmerwe@ece.ogi.edu \n\nArnaud Doucet \n\nCambridge University \nEngineering Department \n\nCambridge CB2  1PZ, England \n\nad2@eng.cam.ac.uk \n\nN ando  de  Freitas \n\nEric Wan \n\nUC  Berkeley,  Computer Science \n\nOregon Graduate Institute \n\n387 Soda Hall,  Berkeley \n\nCA  94720-1776 USA \njfgf@cs.berkeley.edu \n\nElectrical and Computer Engineering \n\nP.O.  Box 91000,Portland,OR 97006,  USA \n\nericwan@ece.ogi.edu \n\nAbstract \n\nIn this paper, we  propose a  new  particle filter  based on sequential \nimportance sampling.  The algorithm uses  a  bank of unscented fil(cid:173)\nters to obtain the importance proposal distribution.  This proposal \nhas  two  very  \"nice\"  properties.  Firstly,  it  makes  efficient  use  of \nthe  latest  available  information  and,  secondly,  it  can  have  heavy \ntails.  As  a  result,  we  find  that  the  algorithm  outperforms  stan(cid:173)\ndard  particle filtering  and  other  nonlinear  filtering  methods  very \nsubstantially.  This  experimental finding  is  in  agreement  with the \ntheoretical  convergence  proof for  the  algorithm.  The  algorithm \nalso includes resampling and (possibly) Markov chain Monte Carlo \n(MCMC)  steps. \n\n1 \n\nIntroduction \n\nFiltering is  the problem  of estimating the  states  (parameters or hidden  variables) \nof  a  system  as  a  set  of observations  becomes  available  on-line.  This  problem  is \nof paramount  importance  in  many  fields  of science,  engineering  and  finance.  To \nsolve  it,  one  begins  by modelling the evolution of the system  and the noise  in the \nmeasurements.  The  resulting  models  typically  exhibit  complex  nonlinearities  and \nnon-Gaussian distributions, thus precluding analytical solution. \n\nThe  best  known  algorithm  to solve the  problem of non-Gaussian,  nonlinear filter(cid:173)\ning  (filtering for  short)  is  the extended Kalman filter  (Anderson  and Moore  1979). \nThis filter  is  based  upon  the principle  of linearising the  measurements and evolu(cid:173)\ntion models  using Taylor series expansions.  The series approximations in the EKF \nalgorithm can, however, lead to poor representations of the nonlinear functions and \nprobability distributions of interest.  As  as  result, this filter  can diverge. \n\nRecently,  Julier and Uhlmann  (Julier  and  Uhlmann  1997)  have  introduced  a  filter \nfounded  on  the  intuition  that  it  is  easier  to  approximate  a  Gaussian  distribution \n\n\fthan  it  is  to  approximate  arbitrary  nonlinear  functions.  They  named  this  filter \nthe unscented Kalman filter  (UKF) . They have shown that the UKF leads to more \naccurate  results  than  the  EKF  and  that  in  particular  it  generates  much  better \nestimates  of  the  covariance  of  the  states  (the  EKF  seems  to  underestimate  this \nquantity).  The UKF has, however, the limitation that it does not apply to general \nnon-Gaussian distributions. \n\nAnother popular solution strategy for the general filtering problem is to use sequen(cid:173)\ntial  Monte  Carlo  methods,  also  known  as  particle  filters  (PFs):  see  for  example \n(Doucet,  Godsill  and Andrieu  2000,  Doucet,  de  Freitas and Gordon 2001,  Gordon, \nSalmond  and Smith  1993).  These methods  allow  for  a  complete  representation of \nthe posterior distribution of the states, so that any statistical estimates, such as the \nmean,  modes,  kurtosis  and variance,  can  be easily computed.  They can therefore, \ndeal with any nonlinearities or distributions. \n\nPFs rely  on  importance sampling  and,  as  a  result,  require  the  design  of proposal \ndistributions  that  can approximate the  posterior  distribution  reasonably  welL  In \ngeneral, it is hard to design such proposals.  The most common strategy is to sample \nfrom the probabilistic model of the states evolution (transition prior).  This strategy \ncan,  however,  fail  if the new  measurements appear in  the tail of the prior or if the \nlikelihood is too peaked in comparison to the prior.  This situation does indeed arise \nin  several  areas  of engineering  and finance,  where  one  can  encounter  sensors  that \nare very accurate (peaked likelihoods) or data that undergoes sudden changes (non(cid:173)\nstationarities):  see for example (Pitt and Shephard 1999, Thrun 2000).  To overcome \nthis  problem,  several  techniques  based  on linearisation have  been  proposed in  the \nliterature (de Freitas 1999, de Freitas, Niranjan, Gee and Doucet 2000, Doucet et aL \n2000,  Pitt and Shephard 1999).  For  example,  in  (de  Freitas et  aL  2000),  the EKF \nGaussian approximation is  used as the proposal distribution for a PF. In this paper, \nwe follow the same approach, but replace the EKF proposal by a UKF proposal.  The \nresulting filter  should  perform  better not only  because the UKF is  more  accurate, \nbut because it also  allows one to control the rate at which the tails of the proposal \ndistribution go to zero.  It becomes thus possible to adopt heavier tailed distributions \nas proposals and, consequently, obtain better importance samplers (Gelman, Carlin, \nStern and Rubin 1995).  Readers are encouraged to consult our technical report for \nfurther results and implementation details (van der Merwe,  Doucet, de Freitas and \nWan 2000)1. \n\n2  Dynamic  State  Space  Model \n\nWe  apply  our  algorithm  to  general  state  space  models  consisting  of a  transition \nequation p(Xt IXt-d and a measurement equation p(Yt IXt).  That is, the states follow \na  Markov  process  and  the  observations  are  assumed  to  be  independent  given  the \nstates.  For example, if we  are interested in nonlinear, non-Gaussian regression, the \nmodel can be expressed as follows \n\nf(Xt-1, Vt-1) \nXt \nYt  =  h(ut,xt,nt) \n\nwhere  Ut  E  Rnu  denotes the input data at time  t,  Xt  E  Rnz  denotes the states  (or \nparameters)  of the  model,  Yt  E  Rny  the observations,  Vt  E  Rnv  the  process  noise \nand  nt  E  Rnn  the  measurement  noise.  The  mappings  f  :  Rnz  x  Rnv  r-+  Rnz  and \nh  : (Rn z  x  Rnu) x Rnn  r-+  Rny represent the deterministic process and measurement \nmodels.  To complete the specification ofthe model, the prior distribution (at t  =  0) \n\nlThe TR and software are available  at http://www.cs.berkeley.edurjfgf . \n\n\fis  denoted  by  p(xo).  Our  goal  will  be  to  approximate  the  posterior  distribution \np(xo:tIYl:t)  and  one  of its  marginals,  the filtering  density p(XtIYl:t) ,  where  Yl:t  = \n{Yl, Y2, ... ,yd\u00b7  By  computing the filtering  density recursively,  we  do  not need  to \nkeep  track of the complete history of the states. \n\n3  Particle Filtering \n\nParticle filters  allow us to approximate the posterior distribution P (xo:t I Yl:t)  using \na  set  of N weighted  samples  (particles)  {x~~L i  =  1, ... , N},  which  are drawn from \nan  importance  proposal  distribution  q(xo:tIYl:t).  These  samples  are  propagated \nin  time  as  shown in Figure  1.  In doing  so,  it  becomes  possible  to map  intractable \nintegration problems  (such  as  computing expectations and marginal distributions) \nto  easy  summations.  This  is  done  in  a  rigorous  setting that  ensures  convergence \naccording to the strong law of large numbers \n\nwhere  ~ denotes  almost  sure  convergence  and  it  : IRn~  -t  IRn't  is  some  func(cid:173)\ntion  of interest.  For  example,  it  could  be  the  conditional  mean,  in  which  case \nit (xo:t)  =  XO:t,  or  the  conditional  covariance  of  Xt  with  it (xo:t)  =  XtX~ \n\ni= 1, ... ,N= 10 particles \n0 \n\n0 \n\n0 \n\no \n\no \n\n000 \n\n0 \n\n\" \n\n, \n\nit  tf' ! i \n1 h lh  j  1 \n\n{x(i)  w(i)} \nt\u00b7 1 \n\nt\u00b7 1' \n\n\u2022\u2022 \n\nFigure 1:  In this example,  a  particle filter  starts at time t - 1 with an unweighted \nmeasure {X~~l' N- 1 }, which provides an approximation of p(Xt-lIYl:t-2).  For each \nparticle  we  compute the importance  weights  using  the  information  at  time  t  - 1. \nThis results  in  the weighted measure  {x~~l!W~~l}' which  yields  an approximation \np(xt-lIYl:t-l).  Subsequently,  a  resampling step  selects  only the  \"fittest\"  particles \nto obtain the unweighted measure  {X~~l' N- 1 }, which  is  still an approximation of \np(Xt-lIYl:t-l) .  Finally, the sampling (prediction)  step introduces variety,  resulting \nin  the measure {x~i), N-l}. \n\n\fFp(x,lyu) [Xt]I8:'p(x,lyu)  [Xt].  A Generic PF algorithm involves the following  steps. \n\nGeneric  PF \n\n1.  Sequential  importance sampling step \n\n\u2022  For  i  =  1, ... ,N. sample x~il  '\" q(XtIX~~L1,Yl:t) and  update the trajectories \n\n-til A.  (-(il \nxo:t  -\n\nx t \n\n(il \n\n) \n,xO:t-1 \n\n\u2022  For i  = 1, ... ,N. evaluate the importance weights up to a  normalizing constant: \n\n(il  _ \n-\n\nw t \n\n) \n( -(il I \nP  xo:t Yl:t \n)  (-(il \n\nI \n) \n(-(i l I  (il \nq  x t  XO:t - 1' Y1:t  P  XO:t - 1 Y1 :t-1 \n(,l [\",N \n\n-til _ \n\n.  h \n\nh \n\n- Wt \n\nI\u00b7 \n\n.  norma  Ize  t  e  welg  ts:  Wt \n\nL.JJ=1 Wt \n\n(Jl] -1 \n\n. \n\nF \nor  ~ = \n\n1 \n, ... , \n\nN \n\n\u2022 \n\n2.  Selection  step \n\n\u2022  Multiply/suppress  samples  (x~i~)  with  high/low  importance  weights  w~il. \n\nrespectively.  to obtain  N  random  samples  (x~i~)  approximately  distributed  ac(cid:173)\ncording to p(X~~~IY1:t). \n\n3.  MCMC  step \n\n\u2022  Apply a  Markov transition  kernel  with  invariant distribution given  by p(x~~~IYl:t) \n\nto obtain  (x~i~). \n\n\u2022 \n\nIn  the  above  algorithm,  we  can  restrict  ourselves  to  importance  functions  of the \nform  q(xo:tIYl:t)  = q(xo)  II q(xkIY1:k,X1:k-I)  to  obtain  a  recursive  formula  to \nevaluate the importance weights \n\nk=1 \n\nt \n\nWt CX \n\nP (Yt I YI:t-l, xo:t) P (Xt I Xt-I) \n\nq (Xt I Yl:t, Xl:t-I) \n\nThere are infinitely many possible choices for  q (xo:tl Yl:t), the only condition being \nthat  its  support  must  include  that  of p(xo:tIYl:t).  The simplest  choice  is  to just \nsample from the prior, P (Xt I Xt- I), in which case the importance weight is  equal to \nthe likelihood,  P (Ytl YI:t-l, xO:t).  This  is  the most  widely  used  distribution,  since \nit  is  simple  to  compute,  but  it  can be  inefficient,  since  it  ignores  the  most  recent \nevidence, Yt. \n\nThe selection (resampling) step is used to eliminate the particles having low impor(cid:173)\ntance  weights  and  to  multiply  particles  having  high  importance  weights  (Gordon \net al.  1993).  This is  done by mapping the weighted  measure  {x~i) ,w~i)} to  an un(cid:173)\nweighted  measure  {x~i), N-I } that  provides  an  approximation of p(xtIYl:t).  After \nthe  selection  scheme  at  time  t,  we  obtain  N  particles  distributed  marginally  ap(cid:173)\nproximately  according to p(xo:tIYl:t).  One  can,  therefore,  apply  a  Markov  kernel \n(for example, a Metropolis or Gibbs kernel) to each particle and the resulting distri(cid:173)\nbution will  still be p(xo:t IYl:t).  This step usually allows us to obtain better results \nand to treat more complex models  (de  Freitas 1999). \n\n\f4  The Unscented Particle Filter \n\nAs  mentioned  earlier,  using  the  transition  prior  as  proposal  distribution  can  be \ninefficient.  As  illustrated in  Figure 2,  if we fail  to use the latest  available informa(cid:173)\ntion  to  propose  new  values  for  the  states,  only  a  few  particles  might  survive.  It \nis  therefore  of paramount  importance  to  move  the  particles  towards  the  regions  of \nhigh  likelihood.  To  achieve  this,  we  propose to use the unscented filter  as proposal \ndistribution.  This simply requires that we  propagate the sufficient  statistics of the \nUKF for  each particle.  For  exact  details,  please  refer  to our technical  report  (van \nder Merwe et al.  2000). \n\nPrior \n\nLikelihood \n\n\u2022 \u2022 \u2022\u2022\u2022\u2022\u2022\u2022\u2022 \u2022  \u2022  \u2022  \u2022 \n\n\u2022  \u2022 \n\nFigure 2:  The UKF proposal distribution allows us to move the samples in the prior \nto  regions  of high  likelihood.  This  is  of paramount  importance  if the  likelihood \nhappens to lie in one of the tails of the prior distribution, or if it is too narrow (low \nmeasurement error). \n\n5  Theoretical Convergence \n\nLet B  (l~n) be the space of bounded, Borel measurable functions on ~n. We denote \nIlfll ~ sup  If (x) I.  The following theorem is a straightforward extension of previous \nresults in  (Crisan and Doucet 2000). \n\nxERn \n\nTheorem 1  If the  importance  weight \n\nWt CX \n\nP (Yt I Xt) P (Xt I Xt-l) \nq (Xt I XO:t-l, Yl:t) \n\n(1) \n\nis  upper  bounded for  any  (Xt-l,yt),  then,  for  all t  ~ 0,  there  exists  Ct  independent \nof N,  such that for  any  ft  E  B  (~n~x(t+l)) \n\n(2) \n\nThe expectation in equation 2 is with respect to the randomness introduced by the \nparticle  filtering  algorithm.  This  convergence  result  shows  that,  under  very  lose \nassumptions,  convergence of the  (unscented)  particle filter  is ensured and that the \nconvergence rate of the method is  independent of the dimension of the state-space. \nThe only crucial assumption is to ensure that Wt  is upper bounded, that is that the \nproposal distribution q (Xt I XO:t-l, Yl:t)  has heavier tails than P (Yt I Xt) P (Xtl Xt-t). \nConsidering  this  theoretical  result,  it  is  not  surprising  that  the  UKF  (which  has \nheavier tails than the EKF) can yield better estimates. \n\n\f6  Demonstration \n\nFor this experiment, a time-series is generated by the following process model Xt+!  = \n1 + sin(w7rt) + \u00a2Xt + Vt,  where Vt  is  a  Gamma(3,2)  random variable modeling the \nprocess noise,  and W  = 4e - 2 and \u00a2 = 0.5 are scalar parameters.  A non-stationary \nobservation model, \n\nt S  30 \nt> 30 \n\nis used.  The observation noise, nt, is drawn from a zero-mean Gaussian distribution. \nGiven  only the  noisy observations,  Yt,  a few  different  filters  were  used to estimate \nthe underlying clean state sequence Xt  for t = 1 ... 60.  The experiment was repeated \n100 times with random re-initialization for  each run.  All  of the particle filters  used \n200  particles.  Table  1  summarizes  the  performance  of the  different  filters.  The \n\nAlgorithm \n\nExtended Kalman  Filter (EKFl \nUnscented Kalman  Filter  (UKF) \nParticle Filter  :  generic \nParticle Filter:  MCMC  move step \nParticle Filter  :  EKF proposal \nParticle Filter:  EKF proposal  and MCMC  move  step \nParticle Filter :  UKF proposal  (\" Unscented  Particle  Filter\") \nParticle Filter:  UKF proposal  and MCMC  move step \n\nMSE \n\nmean \n0.374 \n0.280 \n0.424 \n0.417 \n0.310 \n0.307 \n0.070 \n0.074 \n\nvar \n0.015 \n0.012 \n0.053 \n0.055 \n0.016 \n0.015 \n0.006 \n0.008 \n\nTable 1:  Mean and variance of the MSE  calculated over 100 independent runs. \n\ntable shows  the means  and variances of the mean-square-error  (MSE)  of the state \nestimates.  Note  that  MCMC  could  improve  results  in  other  situations.  Figure  3 \ncompares the estimates generated from  a  single  run of the different  particle filters. \nThe superior performance of the unscented particle filter is  clearly evident.  Figure \n\n'O~--~' O----~2~O----~30-----4~O----~W----~ro\u00b7 \n\nTime \n\nFigure 3:  Plot of the state estimates generated by different filters. \n\n4 shows the estimates of the state covariance generated by a stand-alone EKF and \nUKF for this problem.  Notice how the EKF's estimates are consistently smaller than \nthose generated by the UKF.  This property makes the UKF better suited than the \nEKF for  proposal distribution generation within the particle filter framework. \n\n\fEstimates of state covariance \n\nI-- EKF I \n\nUKF \n\n-\n\n10\"\" \n\nI \n\nI , \n\"'-- ... -.-- ...  ---', \n\n..... , .. , \n\n'O~O:--------\":'0\"----------:20::--------,3\":-0-------\":40-------:5'::-0 ------:\"0 \n\ntime \n\nFigure 4:  EKF and UKF estimates of state covariance. \n\n7  Conclusions \n\nWe  proposed a  new  particle filter  that  uses  unscented filters  as  proposal distribu(cid:173)\ntions.  The convergence proof and empirical evidence, clearly, demonstrate that this \nalgorithm can lead to substantial improvements over other nonlinear filtering  algo(cid:173)\nrithms.  The algorithm is  well suited for  engineering applications, when the sensors \nare very accurate but nonlinear, and financial time series, where outliers and heavy \ntailed  distributions  play  a  significant  role  in  the  analysis  of the data.  For  further \ndetails and experiments,  please refer to our report  (van der Merwe et al.  2000). \n\nReferences \n\nAnderson,  B.  D.  and Moore,  J.  B.  (1979).  Optimal Filtering,  Prentice-Hall,  New  Jersey. \nCrisan,  D.  and Doucet,  A.  (2000).  Convergence  of generalized  particle filters,  Technical \nReport  CUED/F-INFENG/TR  381,  Cambridge University Engineering Department. \n\nde  Freitas,  J.  F .  G.  (1999) .  Bayesian  Methods  for  Neural  Networks,  PhD thesis,  Depart(cid:173)\n\nment of Engineering,  Cambridge University,  Cambridge,  UK \n\nde  Freitas,  J.  F.  G.,  Niranjan,  M.,  Gee,  A.  H.  and Doucet,  A.  (2000).  Sequential  Monte \nCarlo methods to train neural network models,  Neural  Computation 12(4):  955- 993. \nDoucet,  A.,  de  Freitas,  J.  F.  G.  and Gordon,  N.  J.  (eds)  (2001).  Sequential  Monte  Carlo \n\nMethods  in Practice,  Springer-Verlag. \n\nDoucet,  A.,  Godsill,  S.  and  Andrieu,  C.  (2000).  On  sequential  Monte  Carlo  sampling \n\nmethods for  Bayesian filtering,  Statistics  and  Computing 10(3): 197- 208. \n\nGelman,  A.,  Carlin,  J.  B.,  Stern, H.  S. and Rubin,  D. B.  (1995).  Bayesian Data Analysis, \n\nChapman and Hall. \n\nGordon,  N.  J.,  Salmond,  D.  J.  and  Smith,  A.  F.  M.  (1993).  Novel  approach  to \nnonlinear/non-Gaussian  Bayesian state estimation,  lEE Proceedings-F  140(2):  107-\n113. \n\nJulier,  S.  J.  and  Uhlmann,  J.  K \n\n(1997).  A  new  extension  of  the  Kalman  filter \nto  nonlinear  systems,  Proc.  of AeroSense:  The  11th  International  Symposium  on \nAerospace/Defence  Sensing,  Simulation  and  Controls,  Orlando,  Florida. , Vol.  Multi \nSensor  Fusion,  Tracking and Resource  Management II. \n\nPitt,  M.  K  and Shephard,  N.  (1999).  Filtering via simulation:  Auxiliary particle filters, \n\nJournal  of the  American  Statistical  Association 94(446):  590- 599. \n\nThrun,  S.  (2000).  Monte  Carlo  POMDPs,  in S.  Solla,  T.  Leen  and K-R.  Miiller  (eds), \nAdvances  in Neural  Information Processing  Systems  12,  MIT Press,  pp.  1064- 1070. \n\nvan  der  Merwe,  R.,  Doucet,  A.,  de  Freitas,  J .  F.  G.  and Wan,  E.  (2000).  The unscented \nparticle  filter,  Technical  Report  CUED/F-INFENG/TR  380,  Cambridge  University \nEngineering Department. \n\n\f", "award": [], "sourceid": 1818, "authors": [{"given_name": "Rudolph", "family_name": "van der Merwe", "institution": null}, {"given_name": "Arnaud", "family_name": "Doucet", "institution": null}, {"given_name": "Nando", "family_name": "de Freitas", "institution": null}, {"given_name": "Eric", "family_name": "Wan", "institution": null}]}