{"title": "The CONDENSATION Algorithm - Conditional Density Propagation and Applications to Visual Tracking", "book": "Advances in Neural Information Processing Systems", "page_first": 361, "page_last": 367, "abstract": null, "full_text": "The  CONDENSATION  algorithm -\n\nconditional density propagation and \n\napplications to  visual tracking \n\nA.  Blake and M.  Isard(cid:173)\n\nDepartment of Engineering Science, \n\nUniversity of Oxford, \nOxford OXI 3PJ, UK. \n\nAbstract \n\nThe power of sampling methods in Bayesian reconstruction of noisy \nsignals is well known.  The extension of sampling to temporal prob(cid:173)\nlems  is  discussed.  Efficacy  of sampling over  time is  demonstrated \nwith visual tracking. \n\n1 \n\nINTRODUCTION \n\nThe problem of tracking curves in dense visual clutter is a challenging one.  Trackers \nbased  on  Kalman filters  are of limited power;  because  they are  based on Gaussian \ndensities  which  are  unimodal  they  cannot  represent  simultaneous  alternative  hy(cid:173)\npotheses.  Extensions  to  the  Kalman  filter  to  handle  multiple  data  associations \n(Bar-Shalom  and  Fortmann,  1988)  work  satisfactorily  in  the simple case  of point \ntargets  but do not extend  naturally to continuous curves. \nTracking  is  the  propagation  of shape  and  motion  estimates  over  time,  driven  by \na  temporal  stream  of observations.  The noisy  observations  that  arise  in  realistic \nproblems demand a robust approach involving propagation of probability distribu(cid:173)\ntions over time.  Modest levels of noise may be treated satisfactorily using Gaussian \ndensities,  and  this  is  achieved  effectively  by  Kalman filtering  (Gelb,  1974).  More \npervasive  noise  distributions,  as  commonly arise  in  visual background  clutter,  de(cid:173)\nmand a  more powerful,  non-Gaussian approach. \n\nOne  very  effective  approach  is  to  use  random  sampling.  The  CONDENSATION  al(cid:173)\ngorithm , described  here,  combines random sampling with learned dynamical models \nto propagate an entire  probability distribution for  object  position and shape,  over \ntime.  The  result  is  accurate  tracking  of agile  motion  in  clutter,  decidedly  more \n\n\u2022 Web:  http://www.robots.ox.ac.uk/ ... ab/ \n\n\f362 \n\nA.  Blake and M.  lsard \n\nrobust than what has previously been  attainable by Kalman filtering.  Despite  the \nuse  of random sampling, the  algorithm is  efficient,  running in  near real-time when \napplied to visual tracking. \n\n2  SAMPLING METHODS \n\nA standard problem in statistical pattern recognition is to find  an object paramet(cid:173)\nerised  as x  with prior p(x), using data z from a single image.  The posterior density \np(xlz) represents  all the knowledge about x  that is  deducible from  the data.  It can \nbe evaluated in  principle by  applying Bayes' rule  (Papoulis, 1990)  to obtain \n\np(xlz) =  kp(zlx)p(x) \n\n(1) \nwhere  k  is  a  normalisation constant  that is  independent of x.  However p(zlx)  may \nbecome sufficiently complex that p(xlz)  cannot be evaluated simply in closed  form . \nSuch complexity arises typically in visual clutter, when the superfluity of observable \nfeatures  tends to suggest multiple, competing hypotheses for  x.  A one-dimensional \nillustration of the  problem is  illustrated in  figure  1 in  which  multiple features  give \n\nMeasured \nfeatures \n\nz, \n\nZ2 \n\nZ:J \n\nZ4 \n\nZ!j \n\n+  +  +  +  + \n\nZe \n\n+ \n\nI \nI \nI \nI \n\n\u2022 x \n\nx \n\nI \nI \nI \nI \n\nI \nI \nI \nI \n\nI \nI \nI \nI \n\nI \nI \nI \nI \n\nI \nI \nI \nI \n\np(z  I x) \n\na \n\nFigure  1:  One-dimensional  observation  model.  A  probabilistic  observation \nmodel  allowing  for  clutter  and  the  possibility  of missing  the  target  altogether  is \nspecified here  as  a  conditional density p( z I x ) . \n\nrise  to a  multimodal observation density function p(zlx). \n\nWhen direct  evaluation of p(xlz)  is infeasible, iterative sampling techniques  can be \nused  (Geman  and  Geman,  1984;  Ripley  and  Sutherland,  1990;  Grenander  et  al., \n1991;  Storvik,  1994).  The  factored  sampling algorithm  (Grenander  et  al.,  1991). \ngenerates  a  random variate x  from  a  distribution p(x)  that approximates the  pos-\nterior p(xlz).  First a sample-set {s(1), ... , s(N)} is generated from the prior density \np(x)  and then a  sample x  =  Xi,  i  E {I, ... , N}  is  chosen  with probability \n\np(zlx = s(i\u00bb) \n\n7ri  =  N \n\nLj=l p(zlx = s(3\u00bb) \n\n. \n\n. \n\nSampling  methods  have  proved  remarkably  effective  for  recovering  static  objects \nfrom cluttered images.  For such problems x is multi-dimensional, a set of parameters \nfor  curve position and shape.  In that case the sample-set {s(1), ... , s(N)} represents \n\n\fThe CONDENSATION Algorithm \n\n363 \n\na distribution of x-values which can be seen as  a distribution of curves in the image \nplane, as  in figure  2. \n\nFigure 2:  Sample-set representation of shape distributions for a  curve  with \nparameters x,  modelling the  outline (a)  of the head of a  dancing  girl.  Each  sample \ns(n)  is shown as a  curve (of varying position and shape) with a  thickness proportional \nto  the  weight 1r(n).  The  weighted mean of the sample set (b)  serves as an estimator \nof mean  shape \n\n3  THE CONDENSATION ALGORITHM \n\nThe  CONDENSATION  algorithm is  based  on factored sampling but extended  to ap(cid:173)\nply iteratively to successive  images in a sequence.  Similar sampling strategies have \nappeared  elsewhere  (Gordon et  al.,  1993;  Kitigawa,  1996),  presented  as  develop(cid:173)\nments of Monte-Carlo methods.  The methods outlined here  are described  in detail \nelsewhere.  Fuller  descriptions and derivation of the  CONDENSATION  algorithm are \nin  (Isard  and  Blake,  1996;  Blake  and  Isard,  1997)  and  details  of the  learning  of \ndynamical models,  which  is crucial  to the effective  operation of the  algorithm are \nin (Blake et al.,  1995). \n\nGiven  that  the  estimation  process  at  each  time-step  is  a  self-contained  iteration \nof factored  sampling, the  output of an  iteration will  be  a  weighted,  time-stamped \nsample-set, denoted  s~n),  n = 1, ... I  N  with  weights  1r~n) I  representing  approxim(cid:173)\nately the conditional state-density p(xtIZe) at time t, where  Zt = (Zl, ... I  Zt).  How \nis  this  sample-set obtaine-d?  Clearly the  process  must  begin  with  a  prior  density \nand the effective  prior for time-step t should be p(xtIZt-t}.  This prior is of course \nmulti-modal in general and no functional  representation of it is  available.  It is  de-\nrived from the sample set representation  (S~~)ll 1r~~)1)'  n = 1, ... , N of p(Xt-lIZt-l), \nthe output from the previous time-step, to which prediction must then be applied. \n\nThe  iterative  process  applied  to  the  sample-sets  is  depicted  in  figure  3.  At  the \ntop  of the  diagram,  the  output  from  time-step  t  - 1 is  the  weighted  sample-set \n{(st)l' 1rt?l) ,  n  = I, . .. ,N}.  The  aim  is  to  maintain,  at  successive  time-steps, \nsample sets of fixed  size N, so  that the algorithm can  be guaranteed  to run within \na  given  computational resource.  The first  operation  therefore  is  to sample  (with \n\n\f364 \n\nA. Blake and M.  !sard \n\np(x1 1 Z,-1  ) \n\np(x11 Z,) \n\nFigure  3:  One time-step in  the  CONDENSATION  algorithm.  Blob  centres  rep(cid:173)\nresent  sample values  and sizes  depict  sample  weights. \n\nreplacement)  N  times from the set {S~~)l}' choosing a given element with probability \nll't)l'  Some elements,  especially  those  with  high  weights,  may  be  chosen  several \ntimes, leading to identical copies of elements in the new  set.  Others with relatively \nlow  weights may not be chosen  at all. \n\nEach element chosen  from  the new  set  is now  subjected  to  a  predictive step.  {The \ndynamical model  we  generally  use  for  prediction  is  a  linear  stochastic  differential \nequation  (s.d.e.)  learned from  training sets  of sample object  motion  (Blake et  al., \n1995).)  The  predictive  step  includes  a  random  component,  so  identical  elements \nmay now split as each  undergoes  its own independent random motion step.  At  this \nstage,  the  sample set  {s~n)} for  the  new  time-step  has  been  generated  but,  as  yet, \nwithout  its  weights;  it  is  approximately  a  fair  random  sample  from  the  effective \nprior density p(XtIZt-l) for  time-step t.  Finally, the observation step from factored \nsampling  is  applied,  generating  weights  from  the  observation  density  p(Zt IXt)  to \nobtain the sample-set representation  {(s~n), ll'}n\u00bb}  of state-density for  time t. \nThe algorithm is  specified  in detail  in figure  4.  The  process  for  a  single  time-step \nconsists  of N  iterations  to generate  the  N  elements of the  new  sample set.  Each \niteration has three steps, detailed in  the figure,  and we  comment below on each. \n\n1.  Select  nth  new  sample  s~(n)  to  be  some  S~~l  from  the  old  sample  set, \nsampled with replacement with probability 1l'~~1'  This is achieved efficiently \nby using  cumulative weights C~~l (constructed  in step  3). \n\n2.  Predict by  sampling randomly  from  the  conditional  density  for  the  dy(cid:173)\n\nnamical model to generate a  sample for  the new  sample-set. \n\n3.  Measure in order to generate weights ll'~n) for the new sample.  Each weight \n\n\fThe CONDENSATION Algorithm \n\n365 \n\nis evaluated from the observation density function which, being multimodal \nin general,  \"infuses\"  multi modality into the state density. \n\nIterate \n\nFrom the  \"old\"  sample-set {s~~t 7ltL c~~t n  ;;::  1, ... , N}  at time-step t  - 1, \nconstruct  a  \"new\"  sample-set {s~n),7r~n),c~n)},n;;:: 1, .. . ,N for  time t. \nConstruct the  nth  of N  new  samples as follows: \n\n1.  Select a sample s~(n) as follows: \n\n(a)  generate  a random number r  E  [0,1], uniformly distributed. \n(b)  find,  by  binary subdivision, the smallest j  for  which C~~l ~ r \n(c)  set s~(n) = S~~l \n\n2.  Predict by sampling from \n\np(XtIXt-l = S/~~)l) \n\nto choose  each  s~n). \n\n3.  Measure and weight  the new  position in  terms of the  measured  fea(cid:173)\n\ntures  Zt: \n\nthen normalise so that Ln 7r~n) ;;::  1 and store together with cumulative \nProbability as  (s(n)  7r(n)  c(n\u00bb)  where \n\n, \n\nt \n\nt \n\n, \n\nt \n\n(0) \nCt \n(n) \nCt \n\n0, \n(n-l) +  (n) \nCt \n\n7rt \n\n(n = 1 .. . N). \n\nFigure 4:  The  CONDENSATION  algorithm. \n\nAt  any  time-step,  it  is  possible  to  \"report\"  on  the  current  state,  for  example  by \nevaluating some moment of the state density  as \n\n\u00a3(f(xd] ;;:: l:= 7r~n) f  (s~n\u00bb) . \n\nN \n\nn=l \n\n(2) \n\n4  RESULTS \n\nA good deal of experimentation has been performed in applying the CONDENSATION \nalgorithm  to  the  tracking of visual  motion,  including  moving hands  and  dancing \nfigures.  Perhaps one of the most stringent tests was the tracking of a leaf on a bush, \nin  which  the foreground  leaf is effectively  camouflaged against the  background. \n\nA  12  second  (600  field)  sequence  shows  a bush  blowing in  the wind,  the task  being \nto  track one  particular leaf.  A  template was  drawn  by  hand  around  a  still  of one \nchosen leaf and allowed to undergo affine deformations during tracking.  Given that \na  clutter-free  training sequence  is  not  available, the motion  model was  learned  by \nmeans of a bootstrap procedure (Blake et al., 1995).  A tracker with default dynam(cid:173)\nics proved capable of tracking the first  150 fields of a training sequence  before losing \n\n\f", "award": [], "sourceid": 1289, "authors": [{"given_name": "Andrew", "family_name": "Blake", "institution": null}, {"given_name": "Michael", "family_name": "Isard", "institution": null}]}