{"title": "Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound", "book": "Advances in Neural Information Processing Systems", "page_first": 2681, "page_last": 2689, "abstract": "In this work we use Branch-and-Bound (BB) to efficiently detect objects with deformable part models. Instead of evaluating the classifier score exhaustively over image locations and scales, we use BB to focus on promising image locations. The core problem is to compute bounds that accommodate part deformations; for this we adapt the Dual Trees data structure to our problem. We evaluate our approach using Mixture-of-Deformable Part Models. We obtain exactly the same results but are 10-20 times faster on average. We also develop a multiple-object detection variation of the system, where hypotheses for 20 categories are inserted in a common priority queue. For the problem of finding the strongest category in an image this results in up to a 100-fold speedup.", "full_text": "Rapid Deformable Object Detection using Dual-Tree\n\nBranch-and-Bound\n\nIasonas Kokkinos\n\nCenter for Visual Computing\n\nEcole Centrale de Paris\n\niasonas.kokkinos@ecp.fr\n\nAbstract\n\nIn this work we use Branch-and-Bound (BB) to ef\ufb01ciently detect objects with de-\nformable part models. Instead of evaluating the classi\ufb01er score exhaustively over\nimage locations and scales, we use BB to focus on promising image locations.\nThe core problem is to compute bounds that accommodate part deformations; for\nthis we adapt the Dual Trees data structure [7] to our problem.\nWe evaluate our approach using Mixture-of-Deformable Part Models [4]. We ob-\ntain exactly the same results but are 10-20 times faster on average. We also de-\nvelop a multiple-object detection variation of the system, where hypotheses for 20\ncategories are inserted in a common priority queue. For the problem of \ufb01nding the\nstrongest category in an image this results in a 100-fold speedup.\n\n1 Introduction\n\nDeformable Part Models (DPMs) deliver state-of-the-art object detection results [4] on challenging\nbenchmarks when trained discriminatively, and have become a standard in object recognition re-\nsearch. At the heart of these models lies the optimization of a merit function -the classi\ufb01er score-\nwith respect to the part displacements and the global object pose. In this work we take the classi\ufb01er\nfor granted, using the models of [4], and focus on the optimization problem.\n\nThe most common detection algorithm used in conjunction with DPMs relies on Generalized Dis-\ntance Transforms (GDTs) [5], whose complexity is linear in the image size. Despite its amazing\nef\ufb01ciency this algorithm still needs to \ufb01rst evaluate the score everywhere before picking its maxima.\n\nIn this work we use Branch-and-Bound in conjunction with part-based models. For this we exploit\nthe Dual Tree (DT) data structure [7], developed originally to accelerate operations related to Kernel\nDensity Estimation (KDE). We use DTs to provide the bounds required by Branch-and-Bound.\n\nOur method is fairly generic; it applies to any star-shape graphical model involving continuous\nvariables, and pairwise potentials expressed as separable, decreasing binary potential kernels. We\nevaluate our technique using the mixture-of-deformable part models of [4]. Our algorithm delivers\nexactly the same results, but is 15-30 times faster. We also develop a multiple-object detection\nvariation of the system, where all object hypotheses are inserted in the same priority queue. If our\ntask is to \ufb01nd the best (or k-best) object hypotheses in an image this can result in a 100-fold speedup.\n\n2 Previous Work on Ef\ufb01cient Detection\n\nCascaded object detection [20] has led to a proliferation of vision applications, but far less work\nexists to deal with part-based models. The combinatorics of matching have been extensively studied\nfor rigid objects [8], while [17] used A\u2217for detecting object instances. For categories, recent works\n[1, 10, 11, 19, 6, 18, 15] have focused on reducing the high-dimensional pose search space during\n\n1\n\n\fdetection by initially simplifying the cost function being optimized, mostly using ideas similar to\nA\u2217and coarse-to-\ufb01ne processing. In the recent work of [4] thresholds pre-computed on the training\nset are used to prune computation and result in substantial speedups compared to GDTs.\n\nBranch-and-bound (BB) prioritizes the search of promising image areas, as indicated by an upper\nbound on the classi\ufb01er\u2019s score. A most in\ufb02uential paper has been the Ef\ufb01cient Subwindow Search\n(ESS) technique of [12], where an upper bound of a bag-of-words classi\ufb01er score delivers the bounds\nrequired by BB. Later [16] combined Graph-Cuts with BB for object segmentation, while in [13] a\ngeneral cascade system was devised for ef\ufb01cient detection with a nonlinear classi\ufb01er.\n\nOur work is positioned with respect to these works as follows: unlike existing BB works [16, 12, 15],\nwe use the DPM cost and thereby accommodate parts in a rigorous energy minimization framework.\nAnd unlike the pruning-based works [1, 6, 4, 18], we do not make any approximations or assump-\ntions about when it is legitimate to stop computation; our method is exact.\n\nWe obtain the bound required by BB from Dual Trees. To the best of our knowledge, Dual Trees\nhave been minimally been used in object detection; we are only aware of the work in [9] which used\nDTs to ef\ufb01ciently generate particles for Nonparametric Belief Propagation. Here we show that DTs\ncan be used for part-based detection, which is related conceptually, but entirely different technically.\n\n3 Preliminaries\n\nWe \ufb01rst describe the cost function used in DPMs, then outline the limitations of GDT-based detec-\ntion, and \ufb01nally present the concepts of Dual Trees relevant to our setting. Due to lack of space we\nrefer to [2, 4] for further details on DPMs and to [7] [14] for Dual Trees.\n\n3.1 Merit function for DPMs\n\nWe consider a star-shaped graphical model consisting of a set of P + 1 nodes {n0, . . . nP }; n0 is\ncalled the root and the part nodes n1, . . . , nP are connected to the root. Each node p has a unary\nobservation potential Up(x), indicating the \ufb01delity of the image at x to the node; e.g. in [2] Up(x)\nis the inner product of a HOG feature at x with a discriminant wp for p.\nThe location xp = (hp, vp) of part p is constrained with respect to the root location x0 = (h0, v0)\nin terms of a quadratic binary potential Bp(xp, x0) of the form:\nBp(xp, x0)=\u2212 (xp \u2212 x0 \u2212 \u00b5p)T Ip (xp \u2212 x0 \u2212 \u00b5p)=\u2212(hp \u2212 h0 \u2212 \u03b7p)2Hp \u2212 (vp \u2212 v0 \u2212 \u03bdp)2Vp,\nwhere Ip = diag(Hp, Vp) is a diagonal precision matrix and mp = (\u03b7p, \u03bdp) is the nominal difference\nof root-part locations. We will freely alternate between the vector x and its horizontal/vertical h/v\ncoordinates. Moreover we consider \u03b70 = 0, \u00b50 = 0 and H0, V0 large enough so that B0(xp, x0) will\nbe zero for xp = x0 and practically in\ufb01nite elsewhere.\nIf the root is at x0 the merit for part p being at xp is given by mp(xp, x0) = Up(xp) +\nBp(xp, x0); summing over p gives the score Pp mp(xp, x0) of a root-and-parts con\ufb01guration\nX = (x0, . . . , xP ). The detector score at point x is obtained by maximizing over those X with\nx0 = x; this amounts to computing:\nP\n\nP\n\nX\n\np=0\n\nmax\nxp\n\nmp(xp, x) =\n\nX\n\np=0\n\nmax\nxp\n\nUp(xp) \u2212 (hp \u2212 h \u2212 \u03b7p)2Hp \u2212 (vp \u2212 v \u2212 \u03bdp)2Vp.\n\n(1)\n\nS(x)\n\n.\n=\n\nA GDT can be used to maximize each summand in Eq. 1 jointly for all values of x0 in time O(N ),\nwhere N is the number of possible locations. This is dramatically faster than the naive O(N 2)\ncomputation. For a P-part model, complexity decreases from O(N 2P ) to O(N P ).\nStill, the N factor can make things slow for large images. If we know that a certain threshold will\nbe used for detection, e.g. \u22121 for a classi\ufb01er trained with SVMs, the GDT-based approach turns out\nto be wasteful as it treats equally all image locations, even those where we can quickly realize that\nthe classi\ufb01er score cannot exceed this threshold.\n\nThis is illustrated in Fig. 1: in (a) we show the part-root con\ufb01guration that gives the maximum\nscore, and in (b) the score of a bicycle model from [4] over the whole image domain. Our approach\n\n2\n\n\f(a) Input & Detection result (b) Detector score S(x) (c) BB for arg maxx S(x) (d) BB for S(x) \u2265 \u22121.\n\nFigure 1: Motivation for Branch-and-Bound (BB) approach: standard part-based models evaluate a classi\ufb01er\u2019s\nscore S(x) over the whole image domain. Typically only a tiny portion of the image domain should be positive-\nin (b) we draw a black contour around {x : S(x) > \u22121} for an SVM-based classi\ufb01er. BB ignores large\nintervals with low S(x) by upper bounding their values, and postponing their \u2018exploration\u2019 in favor of more\npromising ones. In (c) we show as heat maps the upper bounds of the intervals visited by BB until the strongest\nlocation was explored, and in (d) of the intervals visited until all locations x with S(x) > \u22121 were explored.\nspeeds up detection by upper bounding the score of the detector within intervals of x while using\nlow-cost operations. This allows us to use a prioritized search strategy that can re\ufb01ne these bounds\non promising intervals, while postponing the exploration of less promising intervals.\n\nThis is demonstrated in Fig. 1(c,d) where we show as heat maps the upper bounds of the intervals\nvisited by BB: parts of the image where the heat maps are more \ufb01ne grained correspond to image\nlocations that seemed promising. If our goal is to maximize S(x) BB discards a huge amount of\ncomputation, as shown in (c); even with a more conservative criterion, i.e. \ufb01nding all x : S(x) > \u22121\n(d), a large part of the image domain is effectively ignored and the algorithm obtains re\ufb01ned bounds\nonly around \u2018interesting\u2019 image locations.\n\n3.2 Dual Trees: Data Structures for Set-Set interactions\n\nThe main technical challenge is to ef\ufb01ciently compute upper bounds for a model involving de-\nformable parts; our main contribution consists in realizing that this can be accomplished with the\nDual Tree data structure of [7]. We now give a high-level description of Dual Trees, leaving con-\ncrete aspects for their adaptation to the detection problem; we assume the reader is familiar with\nKD-trees.\n\nDual Trees were developed to ef\ufb01ciently evaluate expressions of the form:\n\nP (xj) =\n\nN\n\nX\n\ni=1\n\nwiK(xj, xi),\n\nxi \u2208 XS,\n\ni = 1, . . . N,\n\nxj \u2208 XD j = 1, . . . , M\n\n(2)\n\nwhere K(\u00b7, \u00b7) is a separable, decreasing kernel, e.g. a Gaussian with diagonal covariance. We refer\nto XS as \u2018source\u2019 terms, and to XD as \u2018domain\u2019 terms, the idea being that the source points XS\ngenerate a \u2018\ufb01eld\u2019 P , which we want evaluate at the domain locations XP .\nNaively performing the computation in Eq. 2 considers all source-domain interactions and takes\nN M operations. The Dual Tree algorithm ef\ufb01ciently computes this sum by using two KD-trees,\none (S) for the source locations XS and another (D) for the domain locations XD. This allows for\nsubstantial speedups when computing Eq. 2 for all domain points, as illustrated in Fig. 2: if a \u2018chunk\u2019\nof source points cannot affect a \u2018chunk\u2019 of domain points, we skip computing their domain-source\npoint interactions.\n\n4 DPM opitimization using Dual Tree Branch and Bound\n\nBrand and Bound (BB) is a maximization algorithm for non-parametric, non-convex or even non-\ndifferentiable functions. BB searches for the interval containing the function\u2019s maximum using a\nprioritized search strategy; the priority of an interval is determined by the function\u2019s upper bound\nwithin it. Starting from an interval containing the whole function domain, BB increasingly narrows\ndown to the solution: at each step an interval of solutions is popped from a priority queue, split\ninto sub-intervals (Branch), and a new upper bound for those intervals is computed (Bound). These\nintervals are then inserted in the priority queue and the process repeats until a singleton interval is\npopped. If the bound is tight for singletons, the \ufb01rst singleton will be the function\u2019s global maximum.\n\n3\n\n\fFigure 2: Left: Dual Trees ef\ufb01ciently deal with the interaction of \u2018source\u2019 (red) and \u2018domain\u2019 points (blue),\nusing easily computable bounds. For instance points lying in square 6 cannot have a large effect on points in\nsquare A, therefore we do not need to go to a \ufb01ner level of resolution to exactly estimate their interactions.\nRight: illustration of the terms involved in the geometric bound computations of Eq. 10.\n\nComing to our case, the DPM criterion developed in Sec. 3.1 is a sum of scores of the form:\n\nsp(x0) = max\n\nxP\n\nmp(xp, x0) = max\n(hp,vp)\n\nUp(hp, vp) \u2212 (hp \u2212 h0 \u2212 \u03b7p)2Hp \u2212 (vp \u2212 v0 \u2212 \u03bdp)2Vp.\n\n(3)\n\nUsing Dual Tree terminology the \u2018source points\u2019 correspond to part locations xp, i.e. XSp = {xp},\nand the \u2018domain points\u2019 to object locations x0, i.e. XD = {x0}. Dual Trees allow us to ef\ufb01ciently\nderive bounds for sp(x0), x0 \u2208 XD, the scores that a set of object locations can have due to a\nset of part p locations. Once these are formed, we add over parts to bound the score S(x0) =\nPp sp(x0), x0 \u2208 XD. This provides the bound needed by Branch-and Bound (BB).\nWe now present our approach through a series intermediate problems. These may be amenable to\nsimpler solutions, but the more complex solutions discussed \ufb01nally lead to our algorithm.\n\n4.1 Maximization for One Domain Point\n\nWe \ufb01rst introduce notation: we index the source/domain points in XS/XD using i/j respectively. We\ndenote by wp\ni = Up(xi) the unary potential of part p at location xi. We shift the unary scores by the\nnominal offsets \u00b5, which gives new source locations: xi \u2192 xi \u2212 \u00b5p, (hi, vi) \u2192 (hi \u2212 \u03b7p, vi \u2212 \u03bdp).\nFinally, we drop p from mp, Hp and Vp unless necessary. We can now write Eq. 3 as:\n\nm(h0, v0) = max\ni\u2208Sp\n\nwi \u2212 H(hi \u2212 h0)2 \u2212 V (vi \u2212 v0)2.\n\n(4)\n\nTo evaluate Eq. 4 at (h0, v0) we use prioritized search over intervals of i \u2208 Sp, starting from Sp\nand gradually narrowing down to the best i. To prioritize intervals we use a KD-tree for the source\npoints xi \u2208 XSp to quickly compute bounds of Eq. 4. In speci\ufb01c, if Sn is the set of children of the\nn-th node of the KD-tree for Sp, consider the subproblem:\n\nmn(h0, v0) = max\ni\u2208Sn\n\nwi \u2212 H(hi \u2212 h0)2 \u2212 V (vi \u2212 v0)2 = max\ni\u2208Sn\n\nwi + Gi,\n\n(5)\n\n.\n= \u2212H(hi \u2212 h0)2 \u2212 V (vi \u2212 v0)2 stands for the geometric part of Eq. 5. We know that for\nwhere Gi\nall points (hi, vi) within Sn we have hi \u2208 [ln, rn] and vi \u2208 [bn, tn], where l, r, b, t are the left, right,\nbottom, top axes de\ufb01ning n\u2019s bounding box, Bn. We can then bound Gi within Sn as follows:\n\nGn = \u2212H min(\u2308l \u2212 h0\u2309, \u2308h0 \u2212 r\u2309)2 \u2212 V min(\u2308b \u2212 v0\u2309, \u2308v0 \u2212 t\u2309)2\nGn = \u2212H max( l \u2212 h0 , h0 \u2212 r )2 \u2212 V max( b \u2212 v0 , v0 \u2212 t )2,\n\n(6)\n(7)\n\nwhere \u2308\u00b7\u2309 = max(\u00b7, 0), and Gn \u2265 Gi \u2265 Gn \u2200i \u2208 Sn. The upper bound is zero inside Bn and uses\nthe boundaries of Bn that lie closest to (h0, v0), when (h0, v0) is outside Bn. The lower bound uses\nthe distance from (h0, v0) to the furthest point within Bn.\nRegarding the wi term in Eq. 5, for both bounds we can use the value wj, j = arg maxi\u2208Sn wi.\nThis is clearly suited for the upper bound. For the lower bound, since Gi > Gn \u2200i \u2208 Sn, we\nhave maxi\u2208Sn wi + Gi \u2265 wj + Gj \u2265 wj + Gn. So wj + Gn provides a proper lower bound for\nmaxi\u2208Sn wi + Gi. Summing up, we bound Eq. 5 as: wj + Gn \u2265 mn(h0, v0) \u2265 wj + Gn.\n\n4\n\n\fl\n\nm 7\n0\n\nn 3\n0\n\no 8\n4\n\nl1\n\nl2\n\nm1\n\n4\n2\n\nm2\n\n6\n1\n\nn1\n\n2\n0\n\nn2\n\n3\n1\n\no1\n\n5\n4\n\no2\n\n8\n6\n\nFigure 3: Supporter pruning: source nodes {m, n, o} are among the possible supporters of domain-node l.\nTheir upper and lower bounds (shown as numbers to the right of each node) are used to prune them. Here, the\nupper bound for n (3) is smaller than the maximal lower bound among supporters (4, from o): this implies the\nupper bound of n\u2019s children contributions to l\u2019s children (shown here for l1) will not surpass the lower bound\nof o\u2019s children. We can thus safely remove n from the supporters.\n\nWe can use the upper bound in a prioritized search for the maximum of m(h0, v0), as described in\nTable 1. Starting with the root of the KD-tree we expand its children nodes, estimate their priorities-\nupper bounds, and insert them in a priority queue. The search stops when the \ufb01rst leaf node is\npopped; this provides the maximizer, as its upper and lower bounds coincide and all other elements\nwaiting in queue have smaller upper bounds. The lower bound is useful in Sec. 4.2.\n\n4.2 Maximization for All Domain Points\n\nHaving described how KD-trees to provide bounds in the single domain point case, we now describe\nhow Dual Trees can speedup this operation in when treating multiple domain points simultaneously.\nIn speci\ufb01c, we consider the following maximization problem:\n\nx\u2217 = arg max\nx\u2208XD\n\nm(x) = arg max\nj\u2208D\n\nmax\ni\u2208S\n\nwi \u2212 H(hi \u2212 hj)2 \u2212 V (vi \u2212 vj)2,\n\n(8)\n\nwhere XD/D is the set of domain points/indices and S are the source indices. The previous algo-\nrithm could deliver x\u2217 by computing m(x) repeatedly for each x \u2208 XD and picking the maximizer.\nBut this will repeat similar checks for neighboring domain points, which can instead be done jointly.\n\nFor this, as in the original Dual Tree work, we build a second KD-tree for the domain points (\u2018Do-\nmain tree\u2019, as opposed to \u2018Source tree\u2019). The nodes in the Domain tree (\u2018domain-nodes\u2019) correspond\nto intervals of domain points that are processed jointly. This saves repetitions of similar bounding\noperations, and quickly discards large domain areas with poor bounds.\n\nFor the bounding operations, as in Sec. 4.1 we consider the effect of source points contained in a\nnode Sn of the Source tree. The difference is that now we bound the maximum of this quantity over\ndomain points contained in a domain-node Dl. In speci\ufb01c, we consider the quantity:\n\nml,n = max\nj\u2208Dl\n\nmax\ni\u2208Sn\n\nwi \u2212 H(hi \u2212 hj)2 \u2212 V (vi \u2212 vj )2\n\n(9)\n\nBounding Gi,j = \u2212H(hi \u2212 hj)2 \u2212 V (vi \u2212 vj)2 involves two 2D intervals, one for the domain-node\nl and one for the domain-node n. If the interval for node n is centered at hn, vn, and has dimensions\ndh,n, dv,n, we use \u00afdh = 1\n\n2 (dv,l + dv,n) and write:\n\n2 (dh,l + dh,n), \u00afdv = 1\n\nGl,n = \u2212H max(\u2308hn \u2212 hl \u2212 \u00afdh\u2309, \u2308hl \u2212 hn \u2212 \u00afdh\u2309)2 \u2212 V max(\u2308vn \u2212 vl \u2212 \u00afdv\u2309, \u2308vl \u2212 vn \u2212 \u00afdv\u2309)2\nGl,n = \u2212H max( hn \u2212 hl + \u00afdh , hl \u2212 hn + \u00afdh )2 \u2212 V max( vn \u2212 vl \u2212 \u00afdv , vl \u2212 vn \u2212 \u00afdv )2\n\nWe illustrate these bounds in Fig. 2. The upper bound is zero if the boxes overlap, or else equals the\n(scaled) distance of their closest points. The lower bound uses the furthest points of the two boxes.\n\nAs in Sec. 4.1, we use w\u2217\n\nn = maxi\u2208Sn wi for the \ufb01rst term in Eq. 9, and bound ml,n as follows:\n\nGl,n + w\u2217\n\nn \u2264 ml,n \u2264 Gl,n + w\u2217\nn.\n\n(10)\n\nThis expression bounds the maximal value m(x) that a point x in domain-node l can have using\ncontributions from points in source-node n. Our initial goal was to \ufb01nd the maximum using all\npossible source point contributions. We now describe a recursive approach to limit the set of source-\nnodes considered, in a manner inspired from the \u2018multi-recursion\u2019 approach of [7].\n\n5\n\n\fFor this, we associate every domain-node l with a set Sl of \u2018supporter\u2019 source-nodes that can yield\nthe maximal contribution to points in l. We start by associating the root node of the Domain tree\nwith the root node of the Source-tree, which means that all domain-source point interactions are\noriginally considered.\n\nWe then recursively increase the \u2018resolution\u2019 of the Domain-tree in parallel with the \u2018resolution\u2019 of\nthe Source-tree. More speci\ufb01cally, to determine the supporters for a child m of domain-node l we\nconsider only the children of the source-nodes in Sl; formally, denoting by pa and ch the parent and\nchild operations respectively we have Sm \u2282 \u222an\u2208Spa(m) {ch(n)}.\nOur goal is to reduce computation by keeping Sm small. This is achieved by pruning based on both\nthe lower and upper bounds derived above. The main observation is that when we go from parents\nto children we decrease the number of source/domain points; this tightens the bounds, i.e. makes\nthe upper bounds less optimistic and the lower bounds more optimistic. Denoting the maximal\nlower bound for contributions to parent node l by Gl = maxn\u2208Sl Gl,n, this means that Gk \u2265 Gl if\npa(k) = l. On the \ufb02ip side, Gl,n \u2264 Gk,q if pa(k) = l, pa(q) = n. This means that if for source-\nnode n at the parent level Gl,n < Gl, at the children level the children of n will contribute something\nworse than Gm, the lower bound on l\u2019s child score. We therefore do not need to keep n among Sl - its\nchildren\u2019s contribution will be certainly worse than the best contribution from other node\u2019s children.\nBased on this observation we can reduce the set of supporters, while guaranteeing optimality.\n\nPseudocode summarizing this algorithm is provided in Table 1. The bounds in Eq. 10 are used in a\nprioritized search algorithm for the maximum of m(x) over x. The algorithm uses a priority queue\nfor Domain tree nodes, initialized with the root of the Domain tree (i.e. the whole range of possible\nlocations x). At each iteration we pop a Domain tree node from the queue, compute upper bounds\nand supporters for its children, which are then pushed in the priority queue. The \ufb01rst leaf node that\nis popped contains the best domain location: its upper bound equals its lower bound, and all other\nnodes in the priority queue have smaller upper bounds, therefore cannot result in a better solution.\n\n4.3 Maximization over All Domain Points and Multiple Parts: Branch and Bound for DPMs\n\nThe algorithm we described in the previous subsection is essentially a Branch-and-Bound (BB)\nalgorithm for the maximization of a merit function\nmax\ni\u2208Sp\n\nwi \u2212 H(hi \u2212 h0)2 \u2212 V (vi \u2212 v0)2\n\nm(x0) = arg max\n(h0,v0)\n\nx\u2217 = arg max\nx0\n\n(11)\n\ncorresponding to a DPM with a single-part (p). To see this, recall that at each step BB pops a\ndomain of the function being maximized from the priority queue, splits it into subdomains (Branch),\nand computes a new upper bound for the subdomains (Bound). In our case Branching amounts\nto considering the two descendants of the domain node being popped, while Bounding amounts to\ntaking the maximum of the upper bounds of the domain node supporters.\n\nThe single-part DPM optimization problem is rather trivial, but adapting the technique to the multi-\npart case is now easy. For this, we rewrite Eq. 1 in a convenient form as:\n\nm(h0, v0) =\n\nP\n\nX\n\np=0\n\nmax\ni\u2208S\n\nwp,i \u2212 Hp(hp\n\ni \u2212 h0)2 \u2212 Vp(vp\n\ni \u2212 v0)2\n\n(12)\n\nusing the conventions we used in Eq. 4. Namely, we only consider using points in S for object parts,\nand subtract mp from hi, vi to yield simple quadratic forms; since mp is part-dependent, we now\nhave a p superscript for hi, vi. Further, we have in general different H, V variables for different\nparts, so we brought back the p subscript for these. Finally, wp,i depends on p, since the same image\npoint will give different unary potentials for different object parts.\n\nFrom this form we realize that computing the upper bound of m(x) within a range of values of\nx, as required by Branch-and-Bound is as easy as it was for the single terms in the previous sec-\ntion. In speci\ufb01c we have m(x) = PP\np=1 mp(x), where mp are the individual part contributions;\nsince maxx PP\np=0 maxx mp(x). we can separately upper bound the individual part\ncontributions, and sum them up to get an overall upper bound.\n\np=0 mp(x) \u2264 PP\n\nPseudocode describing the maximization algorithm is provided in Table 1. Note that each part has its\nown KDtree (SourcT[p]): we build a separate Source-tree per part using the part-speci\ufb01c coordinates\n\n6\n\n\f(hp, vp) and weights wp,i. Each part\u2019s contribution to the score is computed using the supporters it\nlends to the node; the total bound is obtained by summing the individual part bounds.\n\nSingle Domain Point\n\nIN: ST, x {Source Tree, Location x}\nOUT: arg maxxi\u2208ST m(x, xi)\nPush(S,ST.root);\nwhile 1 do\n\nPop(S,popped);\nif popped.UB = popped.LB then\n\nreturn popped;\n\nend if\nfor side = [Left,Right] do\n\nchild = popped.side;\nchild.UB = BoundU(x,child);\nchild.LB = BoundL(x,child);\nPush(S,child);\n\nend for\nend while\n\nMultiple Domain Points\n\nIN: ST, DT {Source/Domain Tree}\nOUT: arg maxx\u2208DT maxi\u2208ST m(x, xi)\nSeed = DT.root;\nSeed.supporters = ST.Root;\nPush(S,Seed);\nwhile 1 do\n\nPop(S,popped);\nif popped.UB = popped.LB then\n\nreturn popped;\n\nend if\nfor side = [Left,Right] do\n\nchild = popped.side;\nsupp = Descend(popped.supp);\nUB,supc = Bound(child,supp,DT,ST);\nchild.UB = UB;\nchild.supc = supc;\nPush(S,child);\n\nend for\nend while\n\nMultiple Domain Points, Multiple Parts\n\nIN: ST[P], DT {P Source Trees/Domain Tree}\nOUT: arg maxx\u2208DT Pp maxi\u2208ST [P ] m(x, xp, i)\nSeed = DT.root;\nfor p = 1 to P do\n\nSeed.supporters[p] = ST[p].Root;\n\nend for\nPush(S,Seed);\nwhile 1 do\n\nPop(S,popped);\nif popped.UB = popped.LB then\n\nreturn popped;\n\nend if\nfor side = [Left,Right] do\n\nchild = popped.side;\nUB = 0;\nfor part = 1:P do\n\nsupp = Descend(popped.supp[part])\nUP,s = Bound(child,supp,DT,ST[p]);\nchild.supp[part] = s;\nUB = UB + UP;\n\nend for\nchild.UB = UB;\nPush(S,child);\n\nend for\nend while\n\nBounding Routine\nIN: child,supporters,DT,ST\nOUT: supch, LB {Chosen supporters, Max LB}\nUB = \u2212\u221e; LB = \u221e;\nfor n \u2208 supporters do\n\nUB[n] = BoundU(DT.node[child],ST.node[n]);\nLB[n] = BoundL(DT.node[child],ST.node[n]);\n\nend for\nMaxLB = max(LB);\nsupch = supporters(\ufb01nd(UB>MaxLB));\nReturn supch, MaxLB;\n\nTable 1: Pseudocode for the algorithms presented in Section 4.\n\n5 Results - Application to Deformable Object Detection\n\nTo estimate the merit of BB we \ufb01rst compare with the mixtures-of-DPMs developed and distributed\nby [3]. We directly extend the Branch-and-Bound technique that we developed for a single DPM to\ndeal with multiple scales and mixtures (\u2018ORs\u2019) of DPMs [4, 21], by inserting all object hypotheses\ninto the same queue. To detect multiple instances of objects at multiple scales we continue BB after\ngetting the best scoring object hypothesis. As termination criterion we choose to stop when we pop\nan interval whose upper bound is below a \ufb01xed threshold.\n\nOur technique delivers essentially the same results as [4]. One minuscule difference is that BB\nuses \ufb02oating point arithmetic for the part locations, while in GDT they are necessarily processed at\ninteger resolution; other than that the results are identical. We therefore do not provide any detection\nperformance curves, but only timing results.\n\nComing to time ef\ufb01ciency, in Fig. 4 (a) we compare the results of the original DPM mixture model\nand our implementation. We use 2000 images from the Pascal dataset and a mix of models for\ndifferent object clases (gains vary per category). We consider the standard detection scenario where\nwe want to detect all objects in an image having score above a certain threshold. We show how\n\n7\n\n\fSpeedup: Single object\n\nSpeedup: M\u2212objects, 1\u2212best\n\nSpeedup: 20\u2212objects, k\u2212best\n\n102\n\np\nu\nd\ne\ne\np\nS\n\n101\n\n100\n\n \n\n \n\n102\n\nt = \u22120.4\nt = \u22120.6\nt = \u22120.8\nt = \u22121.0\n\n101\n\n100\n\n \n\nImage rank\n\n(a)\n\n \n\n102\n\n101\n\n100\n\n \n\nM = 1\nM = 5\nM = 10\nM = 20\n\nImage rank\n\n(b)\n\n \n\n102\n\nk = 1\nk = 2\nk = 5\nk = 10\n\n101\n\n100\n\n \n\nImage rank\n\n(c)\n\nSpeedup \u2212 front\u2212end\n\n \n\nk = 1\n\nImage rank\n\n(d)\n\nFigure 4: (a) Single-object speedup of Branch and Bound compared to GDTs on images from the Pascal\ndataset, (b,c) Multi-object speedup.\n(d) Speedup due to the front-end computation of the unary potentials.\nPlease see text for details.\n\nthe threshold affects the speedup we obtain; for a conservative threshold the speedup is typically\ntenfold, but as we become more aggressive it doubles.\n\nAs a second application, we consider the problem of identifying the \u2018dominant\u2019 object present in\nthe image, i.e. the category the gives the largest score. Typically simpler models, like bag-of-words\nclassi\ufb01ers are applied to this problem, based on the understanding that part-based models can be\ntime-consuming, therefore applying a large set of models to an image would be impractical.\n\nOur claim is that Branch-and-Bound allows us to pursue a different approach, where in fact having\nmore object categories can increase the speed of detection, if we leave the unary potential com-\nputation aside. In speci\ufb01c, our approach can be directly extended to the multiple-object detection\nsetting; as long as the scores computed by different object categories are commensurate, they can all\nbe inserted in the same priority queue. In our experiments we observed that we can get a response\nfaster by introducing more models. The reason for this is that including into our object repertoire a\nmodel giving a large score helps BB stop; otherwise BB keeps searching for another object.\n\nIn plots (b),(c) Fig. 4 we show systematic results on the Pascal dataset. We compare the time that\nwould be required by GDT to perform detection of all multiple objects considered in Pascal, to that\nof a model simultaneously exploring all models. In (b) we show how \ufb01nding the \ufb01rst-best result is\naccelerated as the number of objects (M) increases; while in (c) we show how increasing the \u2018k\u2019 in\n\u2018k-best\u2019 affects the speedup. For small values of k the gains become more pronounced. Of course if\nwe use a \ufb01xed threshold the speedup would not change, when compared to plot (a), since essentially\nthe objects do not \u2018interact\u2019 in any way (we do not use nonmaximum suppression). But as we turn to\nthe best-\ufb01rst problem, the speedup becomes dramatic, ranging in the order of up to a hundred times.\n\nWe note that the timings refer to the \u2018message passing\u2019 part implemented with GDT and not the\ncomputation of unary potentials, which is common for both models, and is currently the bottleneck.\nEven though it is tangential to our contribution in this paper, we mention that as shown in plot (d)\nwe compute unary potentials approximately \ufb01ve times faster than the single-threaded convolution\nprovided by [3] by exploiting Matlab\u2019s optimized matrix multiplication routines.\n\n6 Conclusions\n\nIn this work we have introduced Dual-Tree Branch-and-Bound for ef\ufb01cient part-based detection.\nWe have used Dual Trees to compute upper bounds on the cost function of a part-based model and\nthereby derived a Branch-and-Bound algorithm for detection. Our algorithm is exact and makes no\napproximations, delivering identical results with the DPMs used in [4], but in typically 10-15 less\ntime. Further, we have shown that the \ufb02exibility of prioritized search allows us to consider new\ntasks, such as multiple-object detection, which yielded further speedups. The main challenge for\nfuture work will be to reduce the unary term computation cost; we intend to use BB for this task too.\n\n7 Acknowledgements\n\nWe are grateful to the authors of [3, 12, 9] for making their code available, and to the reviewers for\nconstructive feedback. This work was funded by grant ANR-10-JCJC -0205.\n\n8\n\n\fReferences\n\n[1] Y. Chen, L. Zhu, C. Lin, A. L. Yuille, and H. Zhang. Rapid inference on a novel and/or graph for object\n\ndetection, segmentation and parsing. In NIPS, 2007.\n\n[2] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part\n\nmodel. In CVPR, 2008.\n\n[3] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained deformable part models,\n\nrelease 4. http://www.cs.brown.edu/ pff/latent-release4/.\n\n[4] P. F. Felzenszwalb, R. B. Girshick, and D. A. McAllester. Cascade object detection with deformable part\n\nmodels. In CVPR, 2010.\n\n[5] P. F. Felzenszwalb and D. P. Huttenlocher. Distance transforms of sampled functions. Technical report,\n\nCornell CS, 2004.\n\n[6] V. Ferrari, M. J. Marin-Jimenez, and A. Zisserman. Progressive search space reduction for human pose\n\nestimation. In CVPR, 2008.\n\n[7] A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In\n\nSIAM International Conference on Data Mining, 2003.\n\n[8] E. Grimson. Object Recognition by Computer. MIT Press, 1991.\n[9] A. T. Ihler, E. B. Sudderth, W. T. Freeman, and A. S. Willsky. Ef\ufb01cient multiscale sampling from products\n\nof gaussian mixtures. In NIPS, 2003.\n\n[10] I. Kokkinos and A. Yuille. HOP: Hierarchical Object Parsing. In CVPR, 2009.\n[11] I. Kokkinos and A. L. Yuille. Inference and learning with hierarchical shape models. International Journal\n\nof Computer Vision, 93(2):201\u2013225, 2011.\n\n[12] C. Lampert, M. Blaschko, and T. Hofmann. Beyond sliding windows: Object localization by ef\ufb01cient\n\nsubwindow search. In CVPR, 2008.\n\n[13] C. H. Lampert. An ef\ufb01cient divide-and-conquer cascade for nonlinear object detection. In CVPR, 2010.\n[14] D. Lee, A. G. Gray, and A. W. Moore. Dual-tree fast gauss transforms. In NIPS, 2005.\n[15] A. Lehmann, B. Leibe, and L. V. Gool. Fast PRISM: Branch and Bound Hough Transform for Object\n\nClass Detection. International Journal of Computer Vision, 94(2):175\u2013197, 2011.\n\n[16] V. Lempitsky, A. Blake, and C. Rother. Image segmentation by branch-and-mincut. In ECCV, 2008.\n[17] P. Moreels, M. Maire, and P. Perona. Recognition by probabilistic hypothesis construction. In ECCV,\n\npage 55, 2004.\n\n[18] M. Pedersoli, A. Vedaldi, and J. Gonz`alez. A coarse-to-\ufb01ne approach for fast deformable object detection.\n\nIn CVPR, 2011.\n\n[19] B. Sapp, A. Toshev, and B. Taskar. Cascaded models for articulated pose estimation. In ECCV, 2010.\n[20] P. Viola and M. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features. In CVPR,\n\n2001.\n\n[21] S. C. Zhu and D. Mumford. Quest for a Stochastic Grammar of Images. Foundations and Trends in\n\nComputer Graphics and Vision, 2(4):259\u2013362, 2007.\n\n9\n\n\f", "award": [], "sourceid": 1460, "authors": [{"given_name": "Iasonas", "family_name": "Kokkinos", "institution": null}]}