{"title": "The CONDENSATION Algorithm - Conditional Density Propagation and Applications to Visual Tracking", "book": "Advances in Neural Information Processing Systems", "page_first": 361, "page_last": 367, "abstract": null, "full_text": "The CONDENSATION algorithm -\n\nconditional density propagation and \n\napplications to visual tracking \n\nA. Blake and M. Isard(cid:173)\n\nDepartment of Engineering Science, \n\nUniversity of Oxford, \nOxford OXI 3PJ, UK. \n\nAbstract \n\nThe power of sampling methods in Bayesian reconstruction of noisy \nsignals is well known. The extension of sampling to temporal prob(cid:173)\nlems is discussed. Efficacy of sampling over time is demonstrated \nwith visual tracking. \n\n1 \n\nINTRODUCTION \n\nThe problem of tracking curves in dense visual clutter is a challenging one. Trackers \nbased on Kalman filters are of limited power; because they are based on Gaussian \ndensities which are unimodal they cannot represent simultaneous alternative hy(cid:173)\npotheses. Extensions to the Kalman filter to handle multiple data associations \n(Bar-Shalom and Fortmann, 1988) work satisfactorily in the simple case of point \ntargets but do not extend naturally to continuous curves. \nTracking is the propagation of shape and motion estimates over time, driven by \na temporal stream of observations. The noisy observations that arise in realistic \nproblems demand a robust approach involving propagation of probability distribu(cid:173)\ntions over time. Modest levels of noise may be treated satisfactorily using Gaussian \ndensities, and this is achieved effectively by Kalman filtering (Gelb, 1974). More \npervasive noise distributions, as commonly arise in visual background clutter, de(cid:173)\nmand a more powerful, non-Gaussian approach. \n\nOne very effective approach is to use random sampling. The CONDENSATION al(cid:173)\ngorithm , described here, combines random sampling with learned dynamical models \nto propagate an entire probability distribution for object position and shape, over \ntime. The result is accurate tracking of agile motion in clutter, decidedly more \n\n\u2022 Web: http://www.robots.ox.ac.uk/ ... ab/ \n\n\f362 \n\nA. Blake and M. lsard \n\nrobust than what has previously been attainable by Kalman filtering. Despite the \nuse of random sampling, the algorithm is efficient, running in near real-time when \napplied to visual tracking. \n\n2 SAMPLING METHODS \n\nA standard problem in statistical pattern recognition is to find an object paramet(cid:173)\nerised as x with prior p(x), using data z from a single image. The posterior density \np(xlz) represents all the knowledge about x that is deducible from the data. It can \nbe evaluated in principle by applying Bayes' rule (Papoulis, 1990) to obtain \n\np(xlz) = kp(zlx)p(x) \n\n(1) \nwhere k is a normalisation constant that is independent of x. However p(zlx) may \nbecome sufficiently complex that p(xlz) cannot be evaluated simply in closed form . \nSuch complexity arises typically in visual clutter, when the superfluity of observable \nfeatures tends to suggest multiple, competing hypotheses for x. A one-dimensional \nillustration of the problem is illustrated in figure 1 in which multiple features give \n\nMeasured \nfeatures \n\nz, \n\nZ2 \n\nZ:J \n\nZ4 \n\nZ!j \n\n+ + + + + \n\nZe \n\n+ \n\nI \nI \nI \nI \n\n\u2022 x \n\nx \n\nI \nI \nI \nI \n\nI \nI \nI \nI \n\nI \nI \nI \nI \n\nI \nI \nI \nI \n\nI \nI \nI \nI \n\np(z I x) \n\na \n\nFigure 1: One-dimensional observation model. A probabilistic observation \nmodel allowing for clutter and the possibility of missing the target altogether is \nspecified here as a conditional density p( z I x ) . \n\nrise to a multimodal observation density function p(zlx). \n\nWhen direct evaluation of p(xlz) is infeasible, iterative sampling techniques can be \nused (Geman and Geman, 1984; Ripley and Sutherland, 1990; Grenander et al., \n1991; Storvik, 1994). The factored sampling algorithm (Grenander et al., 1991). \ngenerates a random variate x from a distribution p(x) that approximates the pos-\nterior p(xlz). First a sample-set {s(1), ... , s(N)} is generated from the prior density \np(x) and then a sample x = Xi, i E {I, ... , N} is chosen with probability \n\np(zlx = s(i\u00bb) \n\n7ri = N \n\nLj=l p(zlx = s(3\u00bb) \n\n. \n\n. \n\nSampling methods have proved remarkably effective for recovering static objects \nfrom cluttered images. For such problems x is multi-dimensional, a set of parameters \nfor curve position and shape. In that case the sample-set {s(1), ... , s(N)} represents \n\n\fThe CONDENSATION Algorithm \n\n363 \n\na distribution of x-values which can be seen as a distribution of curves in the image \nplane, as in figure 2. \n\nFigure 2: Sample-set representation of shape distributions for a curve with \nparameters x, modelling the outline (a) of the head of a dancing girl. Each sample \ns(n) is shown as a curve (of varying position and shape) with a thickness proportional \nto the weight 1r(n). The weighted mean of the sample set (b) serves as an estimator \nof mean shape \n\n3 THE CONDENSATION ALGORITHM \n\nThe CONDENSATION algorithm is based on factored sampling but extended to ap(cid:173)\nply iteratively to successive images in a sequence. Similar sampling strategies have \nappeared elsewhere (Gordon et al., 1993; Kitigawa, 1996), presented as develop(cid:173)\nments of Monte-Carlo methods. The methods outlined here are described in detail \nelsewhere. Fuller descriptions and derivation of the CONDENSATION algorithm are \nin (Isard and Blake, 1996; Blake and Isard, 1997) and details of the learning of \ndynamical models, which is crucial to the effective operation of the algorithm are \nin (Blake et al., 1995). \n\nGiven that the estimation process at each time-step is a self-contained iteration \nof factored sampling, the output of an iteration will be a weighted, time-stamped \nsample-set, denoted s~n), n = 1, ... I N with weights 1r~n) I representing approxim(cid:173)\nately the conditional state-density p(xtIZe) at time t, where Zt = (Zl, ... I Zt). How \nis this sample-set obtaine-d? Clearly the process must begin with a prior density \nand the effective prior for time-step t should be p(xtIZt-t}. This prior is of course \nmulti-modal in general and no functional representation of it is available. It is de-\nrived from the sample set representation (S~~)ll 1r~~)1)' n = 1, ... , N of p(Xt-lIZt-l), \nthe output from the previous time-step, to which prediction must then be applied. \n\nThe iterative process applied to the sample-sets is depicted in figure 3. At the \ntop of the diagram, the output from time-step t - 1 is the weighted sample-set \n{(st)l' 1rt?l) , n = I, . .. ,N}. The aim is to maintain, at successive time-steps, \nsample sets of fixed size N, so that the algorithm can be guaranteed to run within \na given computational resource. The first operation therefore is to sample (with \n\n\f364 \n\nA. Blake and M. !sard \n\np(x1 1 Z,-1 ) \n\np(x11 Z,) \n\nFigure 3: One time-step in the CONDENSATION algorithm. Blob centres rep(cid:173)\nresent sample values and sizes depict sample weights. \n\nreplacement) N times from the set {S~~)l}' choosing a given element with probability \nll't)l' Some elements, especially those with high weights, may be chosen several \ntimes, leading to identical copies of elements in the new set. Others with relatively \nlow weights may not be chosen at all. \n\nEach element chosen from the new set is now subjected to a predictive step. {The \ndynamical model we generally use for prediction is a linear stochastic differential \nequation (s.d.e.) learned from training sets of sample object motion (Blake et al., \n1995).) The predictive step includes a random component, so identical elements \nmay now split as each undergoes its own independent random motion step. At this \nstage, the sample set {s~n)} for the new time-step has been generated but, as yet, \nwithout its weights; it is approximately a fair random sample from the effective \nprior density p(XtIZt-l) for time-step t. Finally, the observation step from factored \nsampling is applied, generating weights from the observation density p(Zt IXt) to \nobtain the sample-set representation {(s~n), ll'}n\u00bb} of state-density for time t. \nThe algorithm is specified in detail in figure 4. The process for a single time-step \nconsists of N iterations to generate the N elements of the new sample set. Each \niteration has three steps, detailed in the figure, and we comment below on each. \n\n1. Select nth new sample s~(n) to be some S~~l from the old sample set, \nsampled with replacement with probability 1l'~~1' This is achieved efficiently \nby using cumulative weights C~~l (constructed in step 3). \n\n2. Predict by sampling randomly from the conditional density for the dy(cid:173)\n\nnamical model to generate a sample for the new sample-set. \n\n3. Measure in order to generate weights ll'~n) for the new sample. Each weight \n\n\fThe CONDENSATION Algorithm \n\n365 \n\nis evaluated from the observation density function which, being multimodal \nin general, \"infuses\" multi modality into the state density. \n\nIterate \n\nFrom the \"old\" sample-set {s~~t 7ltL c~~t n ;;:: 1, ... , N} at time-step t - 1, \nconstruct a \"new\" sample-set {s~n),7r~n),c~n)},n;;:: 1, .. . ,N for time t. \nConstruct the nth of N new samples as follows: \n\n1. Select a sample s~(n) as follows: \n\n(a) generate a random number r E [0,1], uniformly distributed. \n(b) find, by binary subdivision, the smallest j for which C~~l ~ r \n(c) set s~(n) = S~~l \n\n2. Predict by sampling from \n\np(XtIXt-l = S/~~)l) \n\nto choose each s~n). \n\n3. Measure and weight the new position in terms of the measured fea(cid:173)\n\ntures Zt: \n\nthen normalise so that Ln 7r~n) ;;:: 1 and store together with cumulative \nProbability as (s(n) 7r(n) c(n\u00bb) where \n\n, \n\nt \n\nt \n\n, \n\nt \n\n(0) \nCt \n(n) \nCt \n\n0, \n(n-l) + (n) \nCt \n\n7rt \n\n(n = 1 .. . N). \n\nFigure 4: The CONDENSATION algorithm. \n\nAt any time-step, it is possible to \"report\" on the current state, for example by \nevaluating some moment of the state density as \n\n\u00a3(f(xd] ;;:: l:= 7r~n) f (s~n\u00bb) . \n\nN \n\nn=l \n\n(2) \n\n4 RESULTS \n\nA good deal of experimentation has been performed in applying the CONDENSATION \nalgorithm to the tracking of visual motion, including moving hands and dancing \nfigures. Perhaps one of the most stringent tests was the tracking of a leaf on a bush, \nin which the foreground leaf is effectively camouflaged against the background. \n\nA 12 second (600 field) sequence shows a bush blowing in the wind, the task being \nto track one particular leaf. A template was drawn by hand around a still of one \nchosen leaf and allowed to undergo affine deformations during tracking. Given that \na clutter-free training sequence is not available, the motion model was learned by \nmeans of a bootstrap procedure (Blake et al., 1995). A tracker with default dynam(cid:173)\nics proved capable of tracking the first 150 fields of a training sequence before losing \n\n\f", "award": [], "sourceid": 1289, "authors": [{"given_name": "Andrew", "family_name": "Blake", "institution": null}, {"given_name": "Michael", "family_name": "Isard", "institution": null}]}