{"title": "A Complete Variational Tracker", "book": "Advances in Neural Information Processing Systems", "page_first": 496, "page_last": 504, "abstract": "We introduce a novel probabilistic tracking algorithm that incorporates combinatorial data association constraints and model-based track management using variational Bayes. We use a Bethe entropy approximation to incorporate data association constraints that are often ignored in previous probabilistic tracking algorithms. Noteworthy aspects of our method include a model-based mechanism to replace heuristic logic typically used to initiate and destroy tracks, and an assignment posterior with linear computation cost in window length as opposed to the exponential scaling of previous MAP-based approaches. We demonstrate the applicability of our method on radar tracking and computer vision problems.", "full_text": "A Complete Variational Tracker\n\nRyan Turner\n\nNorthrop Grumman Corp.\nryan.turner@ngc.com\n\nSteven Bottone\n\nNorthrop Grumman Corp.\n\nBhargav Avasarala\n\nNorthrop Grumman Corp.\n\nsteven.bottone@ngc.com\n\nbhargav.avasarala@ngc.com\n\nAbstract\n\nWe introduce a novel probabilistic tracking algorithm that incorporates combi-\nnatorial data association constraints and model-based track management using\nvariational Bayes. We use a Bethe entropy approximation to incorporate data\nassociation constraints that are often ignored in previous probabilistic tracking al-\ngorithms. Noteworthy aspects of our method include a model-based mechanism\nto replace heuristic logic typically used to initiate and destroy tracks, and an as-\nsignment posterior with linear computation cost in window length as opposed to\nthe exponential scaling of previous MAP-based approaches. We demonstrate the\napplicability of our method on radar tracking and computer vision problems.\n\nThe \ufb01eld of tracking is broad and possesses many applications, particularly in radar/sonar [1],\nrobotics [14], and computer vision [3]. Consider the following problem: A radar is tracking a \ufb02ying\nobject, referred to as a target, using measurements of range, bearing, and elevation; it may also have\nDoppler measurements of radial velocity. We would like to construct a track which estimates the tra-\njectory of the object over time. The Kalman \ufb01lter [16], or a more general state space model, is used\nto \ufb01lter out measurement errors. The key difference between tracking and \ufb01ltering is the presence of\nclutter (noise measurements) and missed detections of true objects. We must determine which mea-\nsurement to \u201cplug in\u201d to the \ufb01lter before applying it; this is known as data association. Additionally\ncomplicating the situation is that we may be in a multi-target tracking scenario in which there are\nmultiple objects to track and we do not know which measurement originated from which object.\nThere is a large body of work on tracking algorithms given its standing as a long-posed and important\nproblem. Algorithms vary primarily on their approach to data association. The dominant approach\nuses a sliding window MAP estimate of the measurement-to-track assignment, in particular the\nmultiple hypothesis tracker (MHT) [1]. In the standard MHT, at every frame the algorithm \ufb01nds\nthe most likely matching of measurements to tracks, in the form of an assignment matrix, under a\none-to-one constraint (see Figure 1). One track can only result in one measurement, and vice versa,\nwhich we refer to as framing constraints. As is typical in MAP estimation, once an assignment\nis determined, the \ufb01lters are updated and the tracker proceeds as if these assignments were known\nto be correct. The one-to-one constraint makes MAP estimation a bipartite matching task where\nalgorithms exist to solve it exactly in polynomial time in the number of tracks NT [15]. However,\nthe multi-frame MHT \ufb01nds the joint MAP assignment over multiple frames, in which case the\nassignment problem is known to be NP-hard, although good approximate solvers exist [20].\n\nFigure 1: Simple scenario with a track swap: \ufb01ltered state estimates \u2217, associated measurements +, and clutter \u00b7;\nand corresponding graphical model. Note that Xk is a matrix since it contains state vectors for all three tracks.\n\n1\n\nTrack Swapclutter (birds)track 1 (747)track 2 (777)track 3 (Cessna)Z1 Z2 Z3 Zk X1 X2 X3 Xk A1 A2 A3 \u2026 Ak S1 S2 S3 \u2026 Sk Meta-states Assignment Matrices (all) Track States Measurements \fDespite the complexity of the MHT, it only \ufb01nds a sliding window MAP estimate of measurement-\nto-track assignments. If a clutter measurement is by chance associated with a track for the duration\nof a window then the tracker will assume with certainty that the measurement originated from that\ntrack, and never reconsider despite all future evidence to the contrary. If multiple clutter (or other-\nwise incorrect) measurements are associated with a track, then it may veer \u201coff into space\u201d and result\nin spurious tracks. Likewise, an endemic problem in tracking is the issue of track swaps, where two\ntrajectories can cross and get mixed up as shown in Figure 1. Alternatives to the MAP approach\ninclude the probabilistic MHT (PMHT) [9, Ch. 4] and probabilistic data association (PDA). How-\never, the PMHT drops the one-to-one constraint in data association and the PDA only allows for a\nsingle target. This led to the development of the joint PDA (JPDA) algorithm for multiple targets,\nwhich utilizes heuristic calculations of the assignment weights and does not scale to multiple frame\nassignment. Particle \ufb01lter implementations of the JPDA have tried to alleviate these issues, but they\nhave not been adopted into real-time systems due to their inef\ufb01ciency and lack of robustness. The\nprobability hypothesis density (PHD) \ufb01lter [19] addresses many of these issues, but only estimates\nthe intensity of objects and does not model full trajectories; this is undesirable since the identity of\nan object is required for many applications including the examples in this paper.\nL\u00b4azaro-Gredilla et al. [18] made the \ufb01rst attempt at a variational Bayes (VB) tracker. In their ap-\nproach every trajectory follows a Gaussian process (GP); measurements are thus modeled by a mix-\nture of GPs. We develop additional VB machinery to retain the framing constraints, which are\ndropped in L\u00b4azaro-Gredilla et al. [18] despite being viewed as important in many systems. Sec-\nondly, our algorithm utilizes a state space approach (e.g. Kalman \ufb01lters) to model tracks, providing\nlinear rather than cubic time complexity in track length. Hartikainen and S\u00a8arkk\u00a8a [11] showed by an\nequivalence that there is little loss of modeling \ufb02exibility by taking a state space approach over GPs.\nMost novel tracking algorithms neglect the critical issue of track management. Many tracking algo-\nrithms unrealistically assume that the number of tracks NT is known a priori and \ufb01xed. Additional\n\u201cwrapper logic\u201d is placed around the trackers to initiate and destroy tracks. This logic involves many\nheuristics such as M-of-N logic [1, Ch. 3]. Our method replaces these heuristics in a model-based\nmanner to make signi\ufb01cant performance gains. We call our method a complete variational tracker\nas it simultaneously does inference for track management, data association, and state estimation.\nThe outline of the paper is as follows: We \ufb01rst describe the full joint probability distribution of\nthe tracking problem in Section 1. This includes how to solve the track management problem by\naugmenting tracks with an active/dormant state to address the issue of an unknown number of tracks.\nBy studying the full joint we develop a new conjugate prior on assignment matrices in Section 2.\nUsing this new formulation we develop a variational algorithm for estimating the measurement-to-\ntrack assignments and track states in Section 3. To retain the framing constraints and ef\ufb01ciently\nscale in tracks and measurements, we modify the variational lower bound in Section 4 using a Bethe\nentropy approximation. This results in a loopy belief propagation (BP) algorithm being used as\na subroutine in our method. In Sections 5\u20136 we show the improvements our method makes on a\ndif\ufb01cult radar tracking example and a real data computer vision problem in sports.\nOur paper presents the following novel contributions: First, we develop the \ufb01rst ef\ufb01cient deter-\nministic approximate inference algorithm for solving the full tracking problem, which includes the\nframing constraints and track management. The most important observation is that the VB assign-\nment posterior has an induced factorization over time with regard to assignment matrices. Therefore,\nthe computational cost of our variational approach is linear in window length as opposed to the ex-\nponential cost of the MAP approach. The most astounding aspect is that by introducing a weaker\napproximation (VB factorization vs MAP) we lower the computational cost from exponential to\nlinear; this is a truly rare and noteworthy example. Second, in the process, we develop new approx-\nimate inference methods on assignment matrices and a new conjugate assignment prior (CAP). We\nbelieve these methods have much larger applicability beyond our current tracking algorithm. Third,\nwe develop a process to handle the track management problem in a model-based way.\n\n1 Model Setup for the Tracking Problem\n\nIn this section we describe the full model used in the tracking problem and develop an unambiguous\nnotation. At each time step k \u2208 N1, known as a frame, we observe NZ(k) \u2208 N0 measurements,\nin a matrix Zk = {zj,k}NZ (k)\n, from both real targets and clutter (spurious measurements). In the\n\nj=1\n\n2\n\n\fNT(cid:88)\n\nradar example zj,k \u2208 Z is a vector of position measurements in R3. In data association we estimate\nthe assignment matrices A, where Aij = 1 if and only if track i is associated with measurement j.\nRecall that each track is associated with at most one measurement, and vice versa, implying:\n\nAij = 1 ,\n\n(1)\nThe zero indices of A \u2208 {0, 1}NT +1\u00d7NZ +1 are the \u201cdummy row\u201d and \u201cdummy column\u201d to repre-\nsent the assignment of a measurement to clutter and the assignment of a track to a missed detection.\n\nAij = 1 ,\n\nj=0\n\ni=0\n\ni \u2208 1:NT , A00 = 0 .\n\nj \u2208 1:NZ ,\n\nNZ(cid:88)\n\nDistribution on Assignments Although not explicitly stated in the literature, a careful exam-\nination of the cost functions used in the MAP optimization in MHT yields a particular and in-\ntuitive prior on the assignment matrices. The number of tracks NT is assumed known a pri-\nori and NZ is random. The corresponding generative process on assignment matrices is as fol-\nlows: 1) Start with a one-to-one mapping from measurements to tracks: A\u2190 INT \u00d7NT . 2) Each\ntrack is observed with probability PD \u2208 [0, 1]NT . Only keep the columns of detected tracks:\nA\u2190 A(\u00b7, d), di \u223c Bernoulli(PD(i)). 3) Sample a Poisson number of clutter measurements\n(columns): A\u2190[A , 0NT \u00d7Nc ], Nc \u223c Poisson(\u03bb). 4) Use a random permutation vector \u03c0 to make\nthe measurement order arbitrary: A\u2190 A(\u00b7, \u03c0). 5) Append a dummy row and column on A to satisfy\nthe summation constraints (1). This process gives the following normalized prior on assignments:\n\nP (A|PD) = \u03bbNc exp(\u2212\u03bb)/NZ!\n\nPD(i)di(1 \u2212 PD(i))1\u2212di .\n\n(2)\n\nNT(cid:89)\n\ni=1\n\nNote that the detections d, NZ, and clutter measurement count Nc are deterministic functions of A.\nTrack Model We utilize a state space formulation over K time steps. The latent states x1:K \u2208 X K\nfollow a Markov process, while the measurements z1:K \u2208 Z K are iid conditional on the track state:\n\np(z1:K, x1:K) = p(x1)\n\np(xk|xk\u22121)\n\np(zk|xk) ,\n\n(3)\n\nK(cid:89)\n\nk=2\n\nK(cid:89)\n\nk=1\n\nwhere we have dropped the track and measurements indices i and j. Although more general models\nare possible, within this paper each track independently follows a linear system (i.e. Kalman \ufb01lter):\n(4)\nTrack Meta-states We address the track management problem by augmenting track states with a\ntwo-state Markov model with an active/dormant meta-state sk in a 1-of-N encoding:\n\np(xk|xk\u22121) = N (xk|Fxk\u22121, Q) ,\n\np(zk|xk) = N (zk|Hxk, R) .\n\nP (s1:K) = P (s1)\n\nP (sk|sk\u22121) ,\n\nsk \u2208 {0, 1}NS .\n\n(5)\n\nK(cid:89)\n\nk=2\n\nThis effectively allows us to handle an unknown number of tracks by making NT arbitrarily large;\nPD is now a function of s with a very small PD in the dormant state and a larger PD in the active\nstate. Extensions with a larger number of states NS are easily implementable. We refer to the collec-\ntion of track meta-states over all tracks at frame k as Sk := {si,k}NT\ni=1; likewise, Xk := {xi,k}NT\ni=1.\nFull Model We combine the assignment process and track models to get the full model joint:\n\np(Zk|Xk, Ak)p(Xk|Xk\u22121)P (Sk|Sk\u22121)P (Ak|Sk)\n\n(6)\n\nNZ (k)(cid:89)\n\nNT(cid:89)\n\np(xi,k|xi,k\u22121)P (si,k|si,k\u22121)\u00b7\n\np0(zj,k)Ak\n\n0j\n\np(zj,k|xi,k, Ak\n\nij = 1)Ak\n\nij ,\n\nk=1\n\ni=1\n\nj=1\n\ni=1\n\nwhere p0 is the clutter distribution, which is often a uniform distribution.\nThe traditional goal\nin tracking is to compute p(Xk|Z1:k), the exact computation of which is intractable due to the\n\u201ccombinatorial explosion\u201d in summing out the assignments A1:k. The MHT MAP-based approach\ntackles this with P (Ak1:k2|Z1:k) \u2248 I{Ak1:k2 = \u02c6Ak1:k2} for a sliding window w = k2 \u2212 k1 + 1.\nClearly an approximation is needed, but we show how to do much better than the MAP approach\nof the MHT. This motivates the next section where we derive a conjugate prior on the assignments\nA1:k, which is useful for improving upon MAP; and we cast (2) as a special case of this distribution.\n\n3\n\nK(cid:89)\n\nk=1\n\np(Z1:K, X1:K, A1:K, S1:K) =\n\nK(cid:89)\n\nP (Ak|Sk) \u00b7 NT(cid:89)\n\n=\n\n\f2 The Conjugate Assignment Prior\nGiven that we must compute the posterior P (A|Z),1 it is natural to ask what conjugate priors on A\nare possible. Deriving approximate inference procedures is often greatly simpli\ufb01ed if the prior on\nthe parameters is conjugate to the complete data likelihood: p(Z, X|A) [2]. We follow the standard\nprocedure for deriving the conjugate prior for an exponential family (EF) complete likelihood:\n\nNZ(cid:89)\n\nj=1\n\nNT(cid:89)\n\ni=1\n\nNT(cid:89)\n\ni=1\n\np(Z, X|A) =\n\np0(zj)A0j\n\np(zj|xi, Aij = 1)Aij\n\np(xi) =\n\np(xi) exp(1(cid:62)(A (cid:12) L)1) ,\n\nLij := log p(zj|xi, Aij = 1) , Li0 := 0 , L0j := log p0(zj) ,\n\n(7)\nwhere we have introduced the matrix L \u2208 RNT +1\u00d7NZ +1 to represent log likelihood contributions\nfrom various assignments. Therefore, we have the following EF quantities [4, Ch. 2.4]: base measure\ni=1 p(xi), partition function g(A) = 1, natural parameters \u03b7(A) = vec A, and suf\ufb01-\ncient statistics T (Z, X) = vec L. This implies the conjugate assignments prior (CAP) for P (A|\u03c7):\n\nh(Z, X) =(cid:81)NT\n\nCAP(A|\u03c7) := Z(\u03c7)\u22121I{A \u2208 A} exp(1(cid:62)(\u03c7 (cid:12) A)1) , Z(\u03c7) :=\n\nexp(1(cid:62)(\u03c7 (cid:12) A)1) ,\n\n(8)\n\nwhere A is the set of all assignment matrices that obey the one-to-one constraints (1). Note that \u03c7 is\na function of the track meta-states S. We recover the assignment prior of (2) in the form of the CAP\ndistribution (8) via the following parameter settings, with \u03c3(\u00b7) denoting the logistic,\n\nNT(cid:89)\n\ni=1\n\n(cid:88)\n\nA\u2208A\n\nNT(cid:89)\n\n(cid:18)\n\n(cid:19)\n\n\u03c7ij = log\n\nPD(i)\n\n(1 \u2212 PD(i))\u03bb\n\n= \u03c3\u22121(PD(i)) \u2212 log \u03bb , i \u2208 1:NT , j \u2208 1:NZ , \u03c70j = \u03c7i0 = 0 . (9)\n\nDue to the symmetries in the prior of (9) we can analytically normalize (8) in this special case:\n\nZ(\u03c7)\u22121 = P (A1:NT ,1:NZ = 0) = Poisson(NZ|\u03bb)\n\n(1 \u2212 PD(i)) .\n\n(10)\n\ni=1\n\nGiven that the dummy row and columns of \u03c7 are zero in (9), equation (10) is clearly the only way\nto get (8) to match (2) for the 0 assignment case.\nAlthough the conjugate prior (8) allows us to \u201ccompute\u201d the posterior, \u03c7posterior = \u03c7prior + L, com-\nputing E[A] or Z(\u03c7) remains dif\ufb01cult in general. This will cause problems in Section 3, but be\nameliorated in Section 4 by a slight modi\ufb01cation of the variational objective.\nOne insight into the partition function Z(\u03c7) is that if we slightly change the constraints in A so\nthat all the rows and columns must sum to one, i.e. we do not use a dummy row or column and A\nbecomes the set of permutation matrices, then Z(\u03c7) is equal to the matrix permanent of exp(\u03c7),\nwhich is #P-complete to compute [24]. Although the matrix permanent is #P-complete, accurate\nand computationally ef\ufb01cient approximations exist, some based on belief propagation [25; 17].\n\n3 Variational Formulation\n\nAs explained in Section 1, exact inference on the full model in (6) is intractable, and as promised we\nshow how to perform better inference than the existing solution of sliding window MAP. Our vari-\national tracker enforces the factorization constraint that the posterior factorizes across assignment\nmatrices and latent track states:\n\np(A1:K, X1:K, S1:K|Z1:K) \u2248 q(A1:K, X1:K, S1:K) = q(A1:K)q(X1:K, S1:K) .\n\n(11)\n\nIn some sense we can think of A as the \u201cparameters\u201d with X and S as the \u201clatent variables\u201d and\nuse the common variational practice of factorizing these two groups of variables. This gives the\nvariational lower bound L(q):\n\nL(q) = Eq[log p(Z1:K, X1:K, A1:K, S1:K)] + H[q(X1:K, S1:K)] + H[q(A1:K)] ,\n1In this section we drop the frame index k and implicitly condition on meta-states Sk for brevity.\n\n(12)\n\n4\n\n\fwhere H[\u00b7] denotes the Shannon entropy. From inspecting the VB lower bound (12) and (6) we\narrive at the following induced factorizations without forcing further factorization upon (11):\n\nK(cid:89)\n\nNT(cid:89)\n\nq(A1:K) =\n\nq(Ak) ,\n\nq(X1:K, S1:K) =\n\nq(xi,\u00b7)q(si,\u00b7) .\n\n(13)\n\nIn other words, the approximate posterior on assignment matrices factorizes across time; and the\napproximate posterior on latent states factorizes across tracks.\n\nk=1\n\ni=1\n\nState Posterior Update Based on the induced factorizations in (13) we derive the updates for the\ntrack states xi,\u00b7 and meta-states si,\u00b7 separately. Additionally, we derive the updates for each track\nseparately. We begin with the variational updates for q(xi,\u00b7) using the standard VB update rules [4,\nCh. 10] and (6), denoting equality to an additive constant with c=,\n\nlog q(xi,\u00b7) c= log p(xi,\u00b7) +\n\nE[Ak\n\nij] log N (zj,k|Hxi,k, R)\n\n=\u21d2 q(xi,\u00b7) \u221d p(xi,\u00b7)\n\nN (zj,k|Hxi,k, R/E[Ak\n\nij]) .\n\nNZ (k)(cid:88)\n\nj=1\n\nK(cid:88)\nNZ (k)(cid:89)\n\nk=1\n\nK(cid:89)\n\nk=1\n\nj=1\n\n(14)\n\n(15)\n\n(17)\n\n(18)\n\n(19)\n\nNZ(cid:88)\n\nj=1\n\nK(cid:89)\n\nk=1\n\nUsing the standard product of Gaussians formula [6] this is proportional to\n\nq(xi,\u00b7) \u221d p(xi,\u00b7)\n\nN (\u02dczi,k|Hxi,k, R/E[di,k]) ,\n\n\u02dczi,k :=\n\n1\n\nE[di,k]\n\nE[Ak\n\nij]zj,k ,\n\n(16)\n\nand recall that E[di,k] = 1 \u2212 E[Ak\nij]. The form of the posterior q(xi,\u00b7) is equiva-\nlent to a linear dynamical system with pseudo-measurements \u02dczi,k and non-stationary measurement\ncovariance R/E[di,k]. Therefore, q(xi,\u00b7) is simply implemented using a Kalman smoother [22].\n\nE[Ak\n\nj=1\n\ni0] =(cid:80)NZ\n\nMeta-state Posterior Update We next consider the posterior on the track meta-states:\n\nK(cid:88)\n\nk=1\n\nK(cid:89)\n\nlog q(si,\u00b7) c= log P (si,\u00b7) +\n\nEq(Ak)[log P (Ak|Sk)] c= log P (si,\u00b7) +\n\n(cid:96)i,k(s) := E[di,k] log(PD(s)) + (1 \u2212 E[di,k]) log(1 \u2212 PD(s)) ,\n\n=\u21d2 q(si,\u00b7) \u221d P (si,\u00b7)\n\nexp(s(cid:62)\n\ni,k(cid:96)i,k) ,\n\nK(cid:88)\n\ns(cid:62)\ni,k(cid:96)i,k ,\ns \u2208 1:NS\n\nk=1\n\nK(cid:89)\n\nk=1\n\nwhere (18) follows from (2). If P (si,\u00b7) follows a Markov chain then the form for q(si,\u00b7) is the same\nas a hidden Markov model (HMM) with emission log likelihoods (cid:96)i,k \u2208 [R\u2212]NS . Therefore, the\nmeta-state posterior q(si,\u00b7) update is implemented using the forward-backward algorithm [21].\nLike the MHT, our algorithm also works in an online fashion using a (much larger) sliding window.\n\nAssignment Matrix Update The reader can verify using (7)\u2013(9) that the exact updates under the\nlower bound L(q) (12) yields a product of CAP distributions:\n\nq(A1:K) =\n\nCAP(Ak|Eq(Xk)[Lk] + Eq(Sk)[\u03c7k]) .\n\n(20)\n\nThis poses a challenging problem, as the state posterior updates of (16) and (19) require Eq(Ak)[Ak];\nsince q(Ak) is a CAP distribution we know from Section 2 its expectation is dif\ufb01cult to compute.\n\nk=1\n\n4 The Assignment Matrix Update Equations\n\nIn this section we modify the variational lower bound (12) to obtain a tractable algorithm. The\nresulting algorithm uses loopy belief propagation to compute Eq(Ak)[Ak] for use in (16) and (19).\n\n5\n\n\fWe \ufb01rst note that the CAP distribution (8) is naturally represented as a factor graph:\n\ni (v) := I{(cid:80)NZ\n\n(21)\nf S\nij(Aij) ,\ni=0 vi = 1} (C for column\nij(v) := exp(\u03c7ijv). We use reparametrization methods (see [10]) to convert (21) to a\n\nwith f R\nfactors), and f S\npairwise factor graph, where derivation of the Bethe free energy is easier. The Bethe entropy is:\n\nf R\ni (Ai\u00b7)\n\nj=1\n\nj=0\n\ni=0\n\ni=1\n\nNZ(cid:89)\n\nNZ(cid:89)\n\nCAP(A|\u03c7) \u221d NT(cid:89)\n\nNT(cid:89)\nj (v) := I{(cid:80)NT\nf C\nj (A\u00b7j)\nj=0 vj = 1} (R for row factors), f C\nNZ(cid:88)\nNT(cid:88)\nNT H[q(cj)] \u2212 NT(cid:88)\nNZ(cid:88)\nH[q(A\u00b7j)] \u2212 NT(cid:88)\n\nNT(cid:88)\nNZ(cid:88)\nNZH[q(ri)] \u2212 NZ(cid:88)\n\u2212 NT(cid:88)\nNZ(cid:88)\nNT(cid:88)\n\nH[q(ri, Aij)] +\n\nH[q(cj, Aij)]\n\nH[q(Ai\u00b7)] +\n\nNZ(cid:88)\n\nj=1\n\nj=1\n\nj=1\n\nj=0\n\ni=1\n\ni=1\n\ni=1\n\ni=0\n\n=\n\ni=1\n\nj=1\n\ni=1\n\nj=1\n\nH[q(Aij)]\n\nH[q(Aij)] ,\n\n(22)\n\n(23)\n\nH\u03b2[q(A)] :=\n\nK(cid:88)\nK(cid:88)\n\nk=1\n\nK(cid:88)\nK(cid:88)\n\nk=1\n\nk=1\n\nwhere the pairwise conversion used constrained auxiliary variables ri := Ai\u00b7 and cj := A\u00b7j; and\nused the implied relations H[q(ri, Aij)] = H[q(ri)] + H[q(Aij|ri)] = H[q(ri)] = H[q(Ai\u00b7)].\nWe de\ufb01ne an altered variational lower bound L\u03b2(q), which merely replaces the entropy H[q(Ak)]\nwith H\u03b2[q(Ak)].2 Note that L\u03b2(q) c= L(q) with respect to q(X1:K, S1:K), which implies the state\nposterior updates under the old bound L(q) in (16) and (19) remain unchanged with the new bound\nL\u03b2(q). To get the new update equations for q(Ak) we examine L\u03b2(q) in terms of q(A1:K):\n\nL\u03b2(q) c= Eq[log p(Z1:K|X1:K, A1:K)] + Eq[log P (A1:K|S1:K)] +\n\nH\u03b2[q(Ak)]\n\n(24)\n\nc=\n\nEq(Ak)[1(cid:62)(Ak (cid:12) (Eq(Xk)[Lk] + Eq(Sk)[\u03c7k]))1] +\n\nH\u03b2[q(Ak)]\n\n(25)\n\nc=\n\nEq(Ak)[log CAP(Ak|Eq(Xk)[Lk] + Eq(Sk)[\u03c7k])] + H\u03b2[q(Ak)] .\n\n(26)\nThis corresponds to the Bethe free energy of the factor graph described in (21), with E[Lk] + E[\u03c7k]\nas the CAP parameter [26; 12]. Therefore, we can compute E[Ak] using loopy belief propagation.\n\nk=1\n\nLoopy BP Derivation We de\ufb01ne the key (row/column) quantities for the belief propagation:\n\nij := msgf R\n\u00b5R\n\ni \u2192Aij\n\n, \u00b5C\n\nij := msgf C\n\nj \u2192Aij\n\n, \u03bdR\n\nij := msgAij\u2192f R\n\n, \u03bdC\n\nij := msgAij\u2192f C\n\n,\n\nj\n\nwhere all messages form functions in {0, 1} \u2192 R+. Using the standard rules of BP we derive:\n\n\u03bdR\nij (x) = \u00b5C\n\nij(x)f S\n\nij(x) , \u00b5R\n\nij(1) =\n\n\u03bdR\nik(0) , \u00b5R\n\nij(0) =\n\n\u03bdR\nil (1)\n\n\u03bdR\nik(0) ,\n\n(27)\n\ni\n\n(cid:88)\n\nl(cid:54)=j\n\n(cid:89)\n\nk(cid:54)=j,l\n\n(cid:89)\n\nk(cid:54)=j\n\nwhere we have exploited that there is only one nonzero value in the row Ai,\u00b7. Notice that\n\u2208 R+ ,\n\nij (0) =\u21d2 \u02dc\u00b5R\n\nij(1) =\n\nij :=\n\n\u00b5R\n\n\u03bdR\n\n=\n\nik(0)(cid:14)\u03bdR\n\n\u00b5R\nij(0)\n\u00b5R\nij(1)\n\n\u03bdR\nil (1)\n\u03bdR\nil (0)\n\n\u2212 \u03bdR\nij (1)\n\u03bdR\nij (0)\n\nNZ(cid:89)\n\nk=0\n\nNZ(cid:88)\n\nl=0\n\n(28)\n\nwhere we have pulled \u00b5R\n\nij(1) out of (27). We write the ratio of messages to row factors \u03bdR as\n\u02dc\u03bdR\nij := \u03bdR\n\n(29)\nWe symmetrically apply (27)\u2013(29) to the column (i.e. C) messages \u02dc\u00b5C\nij . As is common in\nbinary graphs, we summarize the entire message passing update scheme in terms of message ratios:\n\nij (0) = (\u00b5C\n\nij(1)/\u00b5C\n\nij (1)/\u03bdR\n\nil \u2212 \u02dc\u03bdR\n\u02dc\u03bdR\nij ,\n\nexp(\u03c7ij)\n\n\u02dc\u00b5R\n\nij =\n\n(30)\nFinally, we compute the marginal distributions E[Aij] by normalizing the product of the incoming\nmessages to each variable: E[Aij] = P (Aij = 1) = \u03c3(\u03c7ij \u2212 log \u02dc\u00b5R\n\nij \u2212 log \u02dc\u00b5C\nij).\n\n\u02dc\u03bdC\nij =\n\n\u02dc\u03bdR\nij =\n\nij =\n\n\u02dc\u00b5C\nij\n\n\u02dc\u00b5R\nij\n\n\u02dc\u00b5C\n\nl=0\n\nl=0\n\n,\n\n.\n\nexp(\u03c7ij)\n\n2In most models H\u03b2[\u00b7] \u2248 H[\u00b7], but without proof we always observe H\u03b2[\u00b7] \u2264 H[\u00b7]; so L\u03b2 is a lower bound.\n\n6\n\nij(0)) exp(\u03c7ij) \u2208 R+ .\nij and \u02dc\u03bdC\nNT(cid:88)\n\nlj \u2212 \u02dc\u03bdC\n\u02dc\u03bdC\nij ,\n\nNZ(cid:88)\n\n\f(a) Radar Example\n\n(b) SIAP Metrics\n\n(c) Assignment Accuracy\n\nFigure 2: Left: The output of the trackers on the radar example. We show the true trajectories (red \u00b7), 2D MHT\n(solid magenta), 3D MHT (solid green), and OMGP (cyan \u2217). The state estimates for the VB tracker when\nactive (black \u25e6) and dormant (black \u00d7) are shown, where a \u2265 90% threshold on the meta-state s is used to\ndeem a track active for plotting. Center: SIAP metrics for N = 100 realizations of the scenario on the left\nwith 95% error bars. We show positional accuracy (i.e. RMSE) (PA, lower better), spurious tracks (S, lower\nbetter), and track completeness (C, higher better). The bars are in order: VB tracker (blue), 3D MHT (cyan),\n2D MHT (yellow), and OMGP (red). The PA has been rescaled relative to OMGP so all metrics are in %.\nRight: Same as center but looking at assignment accuracy on ARI (higher better), no clutter (NC) ARI (higher\nbetter), and 0-1 loss (lower better) for classifying measurements as clutter.\n\n5 Radar Tracking Example\n\nWe borrow the radar tracking example of the OMGP paper [18]. We have made the example more\nrealistic by adding clutter \u03bb = 8 and missed detections PD = 0.5, which were omitted in [18];\nand also used N = 100 realizations to get con\ufb01dence intervals on the results. We also compare\nwith the 2D and 3D (i.e. multi-frame) MHT trackers as a baseline as they are the most widely used\nmethods in practice. The OMGP requires the number of tracks NT to be speci\ufb01ed in advance, so\nwe provided it with the true number of tracks, which should have given it an extra advantage. The\ntrackers were evaluated using the SIAP metrics, which are the standard evaluation metrics in the\n\ufb01eld [7]. We also use the adjusted Rand index (ARI) [13] to compare the accuracy of the assignments\nmade by the algorithms; the \u201cno clutter\u201d ARI (which ignores clutter) and the 0-1 loss for classifying\nmeasurements as clutter also serve as assignment metrics.\nIn Figure 2(a) both OMGP and 2D MHT miss the real tracks and create spurious tracks from clutter\nmeasurements. The 3D MHT does better, but misses the western portion of track 3 and makes a swap\nbetween track 1 and 3 at their intersection. By contrast, the VB tracker gets the scenario almost\nperfect, except for a small bit of the southern portion of track 2. In that area, VB designates the\ntrack as dormant, acknowledging that the associated measurements are likely clutter. This replaces\nthe notion of a \u201ccon\ufb01rmed\u201d track in the standard tracking literature with a model-based method,\nand demonstrates the advantages of using a principled and model-based paradigm for the track\nmanagement problem. This is quantitatively shown over repeated trials in Figure 2(b) in terms of\npositional error; even more striking are illustrations of the near lack of spurious tracks in VB and\nmuch higher completeness than the competing methods. We also show that the assignments are\nmuch more accurate in Figure 2(c). To check the statistical signi\ufb01cance of our results we used a\npaired t-test to compare the difference between VB and the second best method, the 3D MHT. Both\nthe SIAP and assignment metrics all have p \u2264 10\u22124.\n\n6 Real Data: Video Tracking in Sports\n\nWe use the VS-PETS 2003 soccer player data set as a real data example to validate our method.\nThe data set is a 2500 frame video of players moving around a soccer \ufb01eld, with annotated ground\ntruth; the variety of player interactions make it a challenging test case for multi-object tracking\nalgorithms. To demonstrate the robustness of our tracker to correct a detector provided minimal\ntraining examples, we used multi-scale histogram of oriented gradients (HOG) features from 50\npositive and 50 negative examples of soccer players to train a sliding window support vector machine\n(SVM) [23]. HOG features have been shown to work particularly well for pedestrian detection on\nthe Caltech and INRIA data sets, and thus used for this example [8]. For each frame, the center of\neach bounding box is provided as the only input to our tracker. Despite modest detection rates from\nHOG-SVM, our tracker is still capable of separating clutter and dealing with missed detections.\n\n7\n\ntrack 1track 2track 3PASC020406080100Performance (%)ARINC\u2212ARI0\u2212100.20.40.60.81Performance\f(a) Soccer Tracking Problem\n\n(b) Soccer Assignment Metrics\n\nFigure 3: Left: Example from soccer player tracking. We show the \ufb01ltered state estimates of the MHT (ma-\ngenta \u00b7) and VB tracker (cyan \u25e6) for the last 25 frames as well as the true positions (black). The green boxes\nshow the detection of the HOG-SVM for the current frame. Right: Same as Figure 2(c) but for the soccer data.\nMethods in order: VB-DP (dark blue), VB (light blue), 3D MHT (green), 2D MHT (orange), and OMGP (red).\nSoccer data source: http://www.cvg.rdg.ac.uk/slides/pets.html.\n\nWe modeled player motion using (4) with F and Q derived from an NCV model [1, Ch. 1.5]. The\nparameters for the NCV, R, PD, \u03bb, and the track meta-state parameters were trained by optimizing\nthe variational lower bound L\u03b2 on the \ufb01rst 1000 frames, although the algorithm did not appear sen-\nsitive to these parameters. We additionally show an extension to the VB tracker with nonparametric\nclutter map learning; we learned the clutter map by passing the training measurements into a VB\nDirichlet process (DP) mixture [5] with their probability of being clutter under q(A) as weights. The\nresulting posterior predictive distribution served as p0 in the test phase; we refer to this method as\nthe VB-DP tracker. We split the remainder of the data into 70 sequences of K = 20 frames for a test\nset. Due to the nature of this example, we evaluate the batch accuracy of assigning boxes to the cor-\nrect players. This demonstrates the utility of our algorithm for building a database of player images\nfor later processing and other applications. In Figure 3(b) we show the ARI and related assignment\nmetrics for VB-DP, VB, 2D MHT, 3D MHT, and OMGP. Note that the ARI only evaluates the\naccuracy of the MAP assignment estimate of VB; VB additionally provides uncertainty estimates\non the assignments, unlike the MHT. VB manages to increase the no clutter ARI to 0.95 \u00b1 0.01\nfrom 0.86\u00b1 0.01 for 3D MHT; and decrease the 0-1 clutter loss to 0.18\u00b1 0.01 from 0.21\u00b1 0.01 for\nOMGP. Using the nonparametric clutter map lowered the 0-1 loss to 0.016 \u00b1 0.005 and increased\nthe ARI to 0.94\u00b1 0.01 (vs. 0.76\u00b1 0.01 for the 2D and 3D MHT) as the VB-DP tracker knew certain\nareas, such as the post in the lower right, were more prone to clutter. As in the radar example the\nVB vs. MHT and VB vs. OMGP improvements are signi\ufb01cant at p \u2264 10\u22124. The poor NC-ARI of\nOMGP is likely due to its lack of framing constraints, ignoring prior information on the assignments.\nFurthermore, in Figure 3(a) we plot \ufb01ltered state estimates for the (non-DP) VB tracker; we again\nuse the \u2265 90% meta-state threshold as a \u201ccon\ufb01rmed track.\u201d We see that the MHT is tricked by the\nvarious false detections from HOG-SVM and has spurious tracks across the \ufb01eld; the VB tracker\n\u201cintrospectively\u201d knows when a track is unlikely to be real. While both the MHT and VB detect the\nreferee in the upper right of the frame, the VB tracker quickly sets this track to dormant when he\nleaves the frame. The MHT temporarily extrapolates the track into the \ufb01eld before destroying it.\n\n7 Conclusions\n\nThe model-based manner of handling the track management problem shows clear advantages and\nmay be the path forward for the \ufb01eld, which can clearly bene\ufb01t from algorithms that eliminate\narbitrary tuning parameters. Our method may be desirable even in tracking scenarios under which\na full posterior does not confer advantages over a point estimate. We improve accuracy and reduce\nthe exponential cost of the MAP approach to linear, which is a result of the induced factorizations\nof (13). We have also incorporated the often neglected framing constraints into our variational\nalgorithm, which \ufb01ts nicely with loopy belief propagation methods. Other areas, such as more\nsophisticated meta-state models, provide opportunities to extend this work into more applications of\ntracking and prove it as a general method and alternative to dominant approaches such as the MHT.\n\n8\n\nARINC\u2212ARI0\u2212100.20.40.60.81Performance\fReferences\n[1] Bar-Shalom, Y., Willett, P., and Tian, X. (2011). Tracking and Data Fusion: A Handbook of Algorithms.\n\nYBS Publishing.\n\n[2] Beal, M. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with\n\napplication to scoring graphical model structures. In Bayesian Statistics, volume 7, pages 453\u2013464.\n\n[3] Benfold, B. and Reid, I. (2011). Stable multi-target tracking in real-time surveillance video. In Computer\n\nVision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3457\u20133464. IEEE.\n\n[4] Bishop, C. M. (2007). Pattern Recognition and Machine Learning. Springer.\n[5] Blei, D. M., Jordan, M. I., et al. (2006). Variational inference for Dirichlet process mixtures. Bayesian\n\nanalysis, 1(1):121\u2013143.\n\n[6] Bromiley, P. (2013). Products and convolutions of Gaussian probability density functions. Tina-Vision\n\nMemo 2003-003, University of Manchester.\n\n[7] Byrd, E. (2003). Single integrated air picture (SIAP) attributes version 2.0. Technical Report 2003-029,\n\nDTIC.\n\n[8] Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision\n\nand Pattern Recognition (CVPR), 2005 IEEE Conference on, pages 886\u2013893.\n\n[9] Davey, S. J. (2003). Extensions to the probabilistic multi-hypothesis tracker for improved data association.\n\nPhD thesis, The University of Adelaide.\n\n[10] Eaton, F. and Ghahramani, Z. (2013). Model reductions for inference: Generality of pairwise, binary, and\n\nplanar factor graphs. Neural Computation, 25(5):1213\u20131260.\n\n[11] Hartikainen, J. and S\u00a8arkk\u00a8a, S. (2010). Kalman \ufb01ltering and smoothing solutions to temporal Gaussian\n\nprocess regression models. In Machine Learning for Signal Processing (MLSP), pages 379\u2013384. IEEE.\n\n[12] Heskes, T. (2003). Stable \ufb01xed points of loopy belief propagation are minima of the Bethe free energy. In\n\nAdvances in Neural Information Processing Systems 15, pages 359\u2013366. MIT Press.\n\n[13] Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of classi\ufb01cation, 2(1):193\u2013218.\n[14] Jensfelt, P. and Kristensen, S. (2001). Active global localization for a mobile robot using multiple hy-\n\npothesis tracking. Robotics and Automation, IEEE Transactions on, 17(5):748\u2013760.\n\n[15] Jonker, R. and Volgenant, A. (1987). A shortest augmenting path algorithm for dense and sparse linear\n\nassignment problems. Computing, 38(4):325\u2013340.\n\n[16] Kalman, R. E. (1960). A new approach to linear \ufb01ltering and prediction problems. Transactions of the\n\nASME \u2014 Journal of Basic Engineering, 82(Series D):35\u201345.\n\n[17] Lau, R. A. and Williams, J. L. (2011). Multidimensional assignment by dual decomposition. In Intelligent\nSensors, Sensor Networks and Information Processing (ISSNIP), 2011 Seventh International Conference on,\npages 437\u2013442. IEEE.\n\n[18] L\u00b4azaro-Gredilla, M., Van Vaerenbergh, S., and Lawrence, N. D. (2012). Overlapping mixtures of Gaus-\n\nsian processes for the data association problem. Pattern Recognition, 45(4):1386\u20131395.\n\n[19] Mahler, R. (2003). Multitarget Bayes \ufb01ltering via \ufb01rst-order multitarget moments. Aerospace and Elec-\n\ntronic Systems, IEEE Transactions on, 39(4):1152\u20131178.\n\n[20] Poore, A. P., Rijavec, N., Barker, T. N., and Munger, M. L. (1993). Data association problems posed as\nmultidimensional assignment problems: algorithm development. In Optical Engineering and Photonics in\nAerospace Sensing, pages 172\u2013182. International Society for Optics and Photonics.\n\n[21] Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition.\n\nProceedings of the IEEE, 77(2):257\u2013286.\n\n[22] Rauch, H. E., Tung, F., and Striebel, C. T. (1965). Maximum likelihood estimates of linear dynamical\n\nsystems. AIAA Journal, 3(8):1445\u20131450.\n\n[23] Sch\u00a8olkopf, B. and Smola, A. J. (2001). Learning with Kernels: Support Vector Machines, Regularization,\n\nOptimization, and Beyond. The MIT Press, Cambridge, MA, USA.\n\n[24] Valiant, L. G. (1979). The complexity of computing the permanent. Theoretical computer science,\n\n8(2):189\u2013201.\n\n[25] Watanabe, Y. and Chertkov, M. (2010). Belief propagation and loop calculus for the permanent of a\n\nnon-negative matrix. Journal of Physics A: Mathematical and Theoretical, 43(24):242002.\n\n[26] Yedidia, J. S., Freeman, W. T., and Weiss, Y. (2001). Bethe free energy, Kikuchi approximations, and\n\nbelief propagation algorithms. In Advances in Neural Information Processing Systems 13.\n\n9\n\n\f", "award": [], "sourceid": 321, "authors": [{"given_name": "Ryan", "family_name": "Turner", "institution": "Northrop Grumman"}, {"given_name": "Steven", "family_name": "Bottone", "institution": "Northrop Grumman"}, {"given_name": "Bhargav", "family_name": "Avasarala", "institution": "Northrop Grumman Corp"}]}