{"title": "The Unscented Particle Filter", "book": "Advances in Neural Information Processing Systems", "page_first": 584, "page_last": 590, "abstract": null, "full_text": "The Unscented Particle Filter \n\nRudolph van der Merwe \nOregon Graduate Institute \n\nElectrical and Computer Engineering \n\nP.O. Box 91000,Portland,OR 97006, USA \n\nrvdmerwe@ece.ogi.edu \n\nArnaud Doucet \n\nCambridge University \nEngineering Department \n\nCambridge CB2 1PZ, England \n\nad2@eng.cam.ac.uk \n\nN ando de Freitas \n\nEric Wan \n\nUC Berkeley, Computer Science \n\nOregon Graduate Institute \n\n387 Soda Hall, Berkeley \n\nCA 94720-1776 USA \njfgf@cs.berkeley.edu \n\nElectrical and Computer Engineering \n\nP.O. Box 91000,Portland,OR 97006, USA \n\nericwan@ece.ogi.edu \n\nAbstract \n\nIn this paper, we propose a new particle filter based on sequential \nimportance sampling. The algorithm uses a bank of unscented fil(cid:173)\nters to obtain the importance proposal distribution. This proposal \nhas two very \"nice\" properties. Firstly, it makes efficient use of \nthe latest available information and, secondly, it can have heavy \ntails. As a result, we find that the algorithm outperforms stan(cid:173)\ndard particle filtering and other nonlinear filtering methods very \nsubstantially. This experimental finding is in agreement with the \ntheoretical convergence proof for the algorithm. The algorithm \nalso includes resampling and (possibly) Markov chain Monte Carlo \n(MCMC) steps. \n\n1 \n\nIntroduction \n\nFiltering is the problem of estimating the states (parameters or hidden variables) \nof a system as a set of observations becomes available on-line. This problem is \nof paramount importance in many fields of science, engineering and finance. To \nsolve it, one begins by modelling the evolution of the system and the noise in the \nmeasurements. The resulting models typically exhibit complex nonlinearities and \nnon-Gaussian distributions, thus precluding analytical solution. \n\nThe best known algorithm to solve the problem of non-Gaussian, nonlinear filter(cid:173)\ning (filtering for short) is the extended Kalman filter (Anderson and Moore 1979). \nThis filter is based upon the principle of linearising the measurements and evolu(cid:173)\ntion models using Taylor series expansions. The series approximations in the EKF \nalgorithm can, however, lead to poor representations of the nonlinear functions and \nprobability distributions of interest. As as result, this filter can diverge. \n\nRecently, Julier and Uhlmann (Julier and Uhlmann 1997) have introduced a filter \nfounded on the intuition that it is easier to approximate a Gaussian distribution \n\n\fthan it is to approximate arbitrary nonlinear functions. They named this filter \nthe unscented Kalman filter (UKF) . They have shown that the UKF leads to more \naccurate results than the EKF and that in particular it generates much better \nestimates of the covariance of the states (the EKF seems to underestimate this \nquantity). The UKF has, however, the limitation that it does not apply to general \nnon-Gaussian distributions. \n\nAnother popular solution strategy for the general filtering problem is to use sequen(cid:173)\ntial Monte Carlo methods, also known as particle filters (PFs): see for example \n(Doucet, Godsill and Andrieu 2000, Doucet, de Freitas and Gordon 2001, Gordon, \nSalmond and Smith 1993). These methods allow for a complete representation of \nthe posterior distribution of the states, so that any statistical estimates, such as the \nmean, modes, kurtosis and variance, can be easily computed. They can therefore, \ndeal with any nonlinearities or distributions. \n\nPFs rely on importance sampling and, as a result, require the design of proposal \ndistributions that can approximate the posterior distribution reasonably welL In \ngeneral, it is hard to design such proposals. The most common strategy is to sample \nfrom the probabilistic model of the states evolution (transition prior). This strategy \ncan, however, fail if the new measurements appear in the tail of the prior or if the \nlikelihood is too peaked in comparison to the prior. This situation does indeed arise \nin several areas of engineering and finance, where one can encounter sensors that \nare very accurate (peaked likelihoods) or data that undergoes sudden changes (non(cid:173)\nstationarities): see for example (Pitt and Shephard 1999, Thrun 2000). To overcome \nthis problem, several techniques based on linearisation have been proposed in the \nliterature (de Freitas 1999, de Freitas, Niranjan, Gee and Doucet 2000, Doucet et aL \n2000, Pitt and Shephard 1999). For example, in (de Freitas et aL 2000), the EKF \nGaussian approximation is used as the proposal distribution for a PF. In this paper, \nwe follow the same approach, but replace the EKF proposal by a UKF proposal. The \nresulting filter should perform better not only because the UKF is more accurate, \nbut because it also allows one to control the rate at which the tails of the proposal \ndistribution go to zero. It becomes thus possible to adopt heavier tailed distributions \nas proposals and, consequently, obtain better importance samplers (Gelman, Carlin, \nStern and Rubin 1995). Readers are encouraged to consult our technical report for \nfurther results and implementation details (van der Merwe, Doucet, de Freitas and \nWan 2000)1. \n\n2 Dynamic State Space Model \n\nWe apply our algorithm to general state space models consisting of a transition \nequation p(Xt IXt-d and a measurement equation p(Yt IXt). That is, the states follow \na Markov process and the observations are assumed to be independent given the \nstates. For example, if we are interested in nonlinear, non-Gaussian regression, the \nmodel can be expressed as follows \n\nf(Xt-1, Vt-1) \nXt \nYt = h(ut,xt,nt) \n\nwhere Ut E Rnu denotes the input data at time t, Xt E Rnz denotes the states (or \nparameters) of the model, Yt E Rny the observations, Vt E Rnv the process noise \nand nt E Rnn the measurement noise. The mappings f : Rnz x Rnv r-+ Rnz and \nh : (Rn z x Rnu) x Rnn r-+ Rny represent the deterministic process and measurement \nmodels. To complete the specification ofthe model, the prior distribution (at t = 0) \n\nlThe TR and software are available at http://www.cs.berkeley.edurjfgf . \n\n\fis denoted by p(xo). Our goal will be to approximate the posterior distribution \np(xo:tIYl:t) and one of its marginals, the filtering density p(XtIYl:t) , where Yl:t = \n{Yl, Y2, ... ,yd\u00b7 By computing the filtering density recursively, we do not need to \nkeep track of the complete history of the states. \n\n3 Particle Filtering \n\nParticle filters allow us to approximate the posterior distribution P (xo:t I Yl:t) using \na set of N weighted samples (particles) {x~~L i = 1, ... , N}, which are drawn from \nan importance proposal distribution q(xo:tIYl:t). These samples are propagated \nin time as shown in Figure 1. In doing so, it becomes possible to map intractable \nintegration problems (such as computing expectations and marginal distributions) \nto easy summations. This is done in a rigorous setting that ensures convergence \naccording to the strong law of large numbers \n\nwhere ~ denotes almost sure convergence and it : IRn~ -t IRn't is some func(cid:173)\ntion of interest. For example, it could be the conditional mean, in which case \nit (xo:t) = XO:t, or the conditional covariance of Xt with it (xo:t) = XtX~ \n\ni= 1, ... ,N= 10 particles \n0 \n\n0 \n\n0 \n\no \n\no \n\n000 \n\n0 \n\n\" \n\n, \n\nit tf' ! i \n1 h lh j 1 \n\n{x(i) w(i)} \nt\u00b7 1 \n\nt\u00b7 1' \n\n\u2022\u2022 \n\nFigure 1: In this example, a particle filter starts at time t - 1 with an unweighted \nmeasure {X~~l' N- 1 }, which provides an approximation of p(Xt-lIYl:t-2). For each \nparticle we compute the importance weights using the information at time t - 1. \nThis results in the weighted measure {x~~l!W~~l}' which yields an approximation \np(xt-lIYl:t-l). Subsequently, a resampling step selects only the \"fittest\" particles \nto obtain the unweighted measure {X~~l' N- 1 }, which is still an approximation of \np(Xt-lIYl:t-l) . Finally, the sampling (prediction) step introduces variety, resulting \nin the measure {x~i), N-l}. \n\n\fFp(x,lyu) [Xt]I8:'p(x,lyu) [Xt]. A Generic PF algorithm involves the following steps. \n\nGeneric PF \n\n1. Sequential importance sampling step \n\n\u2022 For i = 1, ... ,N. sample x~il '\" q(XtIX~~L1,Yl:t) and update the trajectories \n\n-til A. (-(il \nxo:t -\n\nx t \n\n(il \n\n) \n,xO:t-1 \n\n\u2022 For i = 1, ... ,N. evaluate the importance weights up to a normalizing constant: \n\n(il _ \n-\n\nw t \n\n) \n( -(il I \nP xo:t Yl:t \n) (-(il \n\nI \n) \n(-(i l I (il \nq x t XO:t - 1' Y1:t P XO:t - 1 Y1 :t-1 \n(,l [\",N \n\n-til _ \n\n. h \n\nh \n\n- Wt \n\nI\u00b7 \n\n. norma Ize t e welg ts: Wt \n\nL.JJ=1 Wt \n\n(Jl] -1 \n\n. \n\nF \nor ~ = \n\n1 \n, ... , \n\nN \n\n\u2022 \n\n2. Selection step \n\n\u2022 Multiply/suppress samples (x~i~) with high/low importance weights w~il. \n\nrespectively. to obtain N random samples (x~i~) approximately distributed ac(cid:173)\ncording to p(X~~~IY1:t). \n\n3. MCMC step \n\n\u2022 Apply a Markov transition kernel with invariant distribution given by p(x~~~IYl:t) \n\nto obtain (x~i~). \n\n\u2022 \n\nIn the above algorithm, we can restrict ourselves to importance functions of the \nform q(xo:tIYl:t) = q(xo) II q(xkIY1:k,X1:k-I) to obtain a recursive formula to \nevaluate the importance weights \n\nk=1 \n\nt \n\nWt CX \n\nP (Yt I YI:t-l, xo:t) P (Xt I Xt-I) \n\nq (Xt I Yl:t, Xl:t-I) \n\nThere are infinitely many possible choices for q (xo:tl Yl:t), the only condition being \nthat its support must include that of p(xo:tIYl:t). The simplest choice is to just \nsample from the prior, P (Xt I Xt- I), in which case the importance weight is equal to \nthe likelihood, P (Ytl YI:t-l, xO:t). This is the most widely used distribution, since \nit is simple to compute, but it can be inefficient, since it ignores the most recent \nevidence, Yt. \n\nThe selection (resampling) step is used to eliminate the particles having low impor(cid:173)\ntance weights and to multiply particles having high importance weights (Gordon \net al. 1993). This is done by mapping the weighted measure {x~i) ,w~i)} to an un(cid:173)\nweighted measure {x~i), N-I } that provides an approximation of p(xtIYl:t). After \nthe selection scheme at time t, we obtain N particles distributed marginally ap(cid:173)\nproximately according to p(xo:tIYl:t). One can, therefore, apply a Markov kernel \n(for example, a Metropolis or Gibbs kernel) to each particle and the resulting distri(cid:173)\nbution will still be p(xo:t IYl:t). This step usually allows us to obtain better results \nand to treat more complex models (de Freitas 1999). \n\n\f4 The Unscented Particle Filter \n\nAs mentioned earlier, using the transition prior as proposal distribution can be \ninefficient. As illustrated in Figure 2, if we fail to use the latest available informa(cid:173)\ntion to propose new values for the states, only a few particles might survive. It \nis therefore of paramount importance to move the particles towards the regions of \nhigh likelihood. To achieve this, we propose to use the unscented filter as proposal \ndistribution. This simply requires that we propagate the sufficient statistics of the \nUKF for each particle. For exact details, please refer to our technical report (van \nder Merwe et al. 2000). \n\nPrior \n\nLikelihood \n\n\u2022 \u2022 \u2022\u2022\u2022\u2022\u2022\u2022\u2022 \u2022 \u2022 \u2022 \u2022 \n\n\u2022 \u2022 \n\nFigure 2: The UKF proposal distribution allows us to move the samples in the prior \nto regions of high likelihood. This is of paramount importance if the likelihood \nhappens to lie in one of the tails of the prior distribution, or if it is too narrow (low \nmeasurement error). \n\n5 Theoretical Convergence \n\nLet B (l~n) be the space of bounded, Borel measurable functions on ~n. We denote \nIlfll ~ sup If (x) I. The following theorem is a straightforward extension of previous \nresults in (Crisan and Doucet 2000). \n\nxERn \n\nTheorem 1 If the importance weight \n\nWt CX \n\nP (Yt I Xt) P (Xt I Xt-l) \nq (Xt I XO:t-l, Yl:t) \n\n(1) \n\nis upper bounded for any (Xt-l,yt), then, for all t ~ 0, there exists Ct independent \nof N, such that for any ft E B (~n~x(t+l)) \n\n(2) \n\nThe expectation in equation 2 is with respect to the randomness introduced by the \nparticle filtering algorithm. This convergence result shows that, under very lose \nassumptions, convergence of the (unscented) particle filter is ensured and that the \nconvergence rate of the method is independent of the dimension of the state-space. \nThe only crucial assumption is to ensure that Wt is upper bounded, that is that the \nproposal distribution q (Xt I XO:t-l, Yl:t) has heavier tails than P (Yt I Xt) P (Xtl Xt-t). \nConsidering this theoretical result, it is not surprising that the UKF (which has \nheavier tails than the EKF) can yield better estimates. \n\n\f6 Demonstration \n\nFor this experiment, a time-series is generated by the following process model Xt+! = \n1 + sin(w7rt) + \u00a2Xt + Vt, where Vt is a Gamma(3,2) random variable modeling the \nprocess noise, and W = 4e - 2 and \u00a2 = 0.5 are scalar parameters. A non-stationary \nobservation model, \n\nt S 30 \nt> 30 \n\nis used. The observation noise, nt, is drawn from a zero-mean Gaussian distribution. \nGiven only the noisy observations, Yt, a few different filters were used to estimate \nthe underlying clean state sequence Xt for t = 1 ... 60. The experiment was repeated \n100 times with random re-initialization for each run. All of the particle filters used \n200 particles. Table 1 summarizes the performance of the different filters. The \n\nAlgorithm \n\nExtended Kalman Filter (EKFl \nUnscented Kalman Filter (UKF) \nParticle Filter : generic \nParticle Filter: MCMC move step \nParticle Filter : EKF proposal \nParticle Filter: EKF proposal and MCMC move step \nParticle Filter : UKF proposal (\" Unscented Particle Filter\") \nParticle Filter: UKF proposal and MCMC move step \n\nMSE \n\nmean \n0.374 \n0.280 \n0.424 \n0.417 \n0.310 \n0.307 \n0.070 \n0.074 \n\nvar \n0.015 \n0.012 \n0.053 \n0.055 \n0.016 \n0.015 \n0.006 \n0.008 \n\nTable 1: Mean and variance of the MSE calculated over 100 independent runs. \n\ntable shows the means and variances of the mean-square-error (MSE) of the state \nestimates. Note that MCMC could improve results in other situations. Figure 3 \ncompares the estimates generated from a single run of the different particle filters. \nThe superior performance of the unscented particle filter is clearly evident. Figure \n\n'O~--~' O----~2~O----~30-----4~O----~W----~ro\u00b7 \n\nTime \n\nFigure 3: Plot of the state estimates generated by different filters. \n\n4 shows the estimates of the state covariance generated by a stand-alone EKF and \nUKF for this problem. Notice how the EKF's estimates are consistently smaller than \nthose generated by the UKF. This property makes the UKF better suited than the \nEKF for proposal distribution generation within the particle filter framework. \n\n\fEstimates of state covariance \n\nI-- EKF I \n\nUKF \n\n-\n\n10\"\" \n\nI \n\nI , \n\"'-- ... -.-- ... ---', \n\n..... , .. , \n\n'O~O:--------\":'0\"----------:20::--------,3\":-0-------\":40-------:5'::-0 ------:\"0 \n\ntime \n\nFigure 4: EKF and UKF estimates of state covariance. \n\n7 Conclusions \n\nWe proposed a new particle filter that uses unscented filters as proposal distribu(cid:173)\ntions. The convergence proof and empirical evidence, clearly, demonstrate that this \nalgorithm can lead to substantial improvements over other nonlinear filtering algo(cid:173)\nrithms. The algorithm is well suited for engineering applications, when the sensors \nare very accurate but nonlinear, and financial time series, where outliers and heavy \ntailed distributions play a significant role in the analysis of the data. For further \ndetails and experiments, please refer to our report (van der Merwe et al. 2000). \n\nReferences \n\nAnderson, B. D. and Moore, J. B. (1979). Optimal Filtering, Prentice-Hall, New Jersey. \nCrisan, D. and Doucet, A. (2000). Convergence of generalized particle filters, Technical \nReport CUED/F-INFENG/TR 381, Cambridge University Engineering Department. \n\nde Freitas, J. F . G. (1999) . Bayesian Methods for Neural Networks, PhD thesis, Depart(cid:173)\n\nment of Engineering, Cambridge University, Cambridge, UK \n\nde Freitas, J. F. G., Niranjan, M., Gee, A. H. and Doucet, A. (2000). Sequential Monte \nCarlo methods to train neural network models, Neural Computation 12(4): 955- 993. \nDoucet, A., de Freitas, J. F. G. and Gordon, N. J. (eds) (2001). Sequential Monte Carlo \n\nMethods in Practice, Springer-Verlag. \n\nDoucet, A., Godsill, S. and Andrieu, C. (2000). On sequential Monte Carlo sampling \n\nmethods for Bayesian filtering, Statistics and Computing 10(3): 197- 208. \n\nGelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (1995). Bayesian Data Analysis, \n\nChapman and Hall. \n\nGordon, N. J., Salmond, D. J. and Smith, A. F. M. (1993). Novel approach to \nnonlinear/non-Gaussian Bayesian state estimation, lEE Proceedings-F 140(2): 107-\n113. \n\nJulier, S. J. and Uhlmann, J. K \n\n(1997). A new extension of the Kalman filter \nto nonlinear systems, Proc. of AeroSense: The 11th International Symposium on \nAerospace/Defence Sensing, Simulation and Controls, Orlando, Florida. , Vol. Multi \nSensor Fusion, Tracking and Resource Management II. \n\nPitt, M. K and Shephard, N. (1999). Filtering via simulation: Auxiliary particle filters, \n\nJournal of the American Statistical Association 94(446): 590- 599. \n\nThrun, S. (2000). Monte Carlo POMDPs, in S. Solla, T. Leen and K-R. Miiller (eds), \nAdvances in Neural Information Processing Systems 12, MIT Press, pp. 1064- 1070. \n\nvan der Merwe, R., Doucet, A., de Freitas, J . F. G. and Wan, E. (2000). The unscented \nparticle filter, Technical Report CUED/F-INFENG/TR 380, Cambridge University \nEngineering Department. \n\n\f", "award": [], "sourceid": 1818, "authors": [{"given_name": "Rudolph", "family_name": "van der Merwe", "institution": null}, {"given_name": "Arnaud", "family_name": "Doucet", "institution": null}, {"given_name": "Nando", "family_name": "de Freitas", "institution": null}, {"given_name": "Eric", "family_name": "Wan", "institution": null}]}