{"title": "Planar Ultrametrics for Image Segmentation", "book": "Advances in Neural Information Processing Systems", "page_first": 64, "page_last": 72, "abstract": "We study the problem of hierarchical clustering on planar graphs. We formulate this in terms of finding the closest ultrametric to a specified set of distances and solve it using an LP relaxation that leverages minimum cost perfect matching as a subroutine to efficiently explore the space of planar partitions. We apply our algorithm to the problem of hierarchical image segmentation.", "full_text": "Planar Ultrametrics for Image Segmentation\n\nJulian Yarkony\nExperian Data Lab\n\nSan Diego, CA 92130\n\njulian.yarkony@experian.com\n\nCharless C. Fowlkes\n\nDepartment of Computer Science\n\nUniversity of California Irvine\nfowlkes@ics.uci.edu\n\nAbstract\n\nWe study the problem of hierarchical clustering on planar graphs. We formulate\nthis in terms of \ufb01nding the closest ultrametric to a speci\ufb01ed set of distances and\nsolve it using an LP relaxation that leverages minimum cost perfect matching as\na subroutine to ef\ufb01ciently explore the space of planar partitions. We apply our\nalgorithm to the problem of hierarchical image segmentation.\n\n1\n\nIntroduction\n\nWe formulate hierarchical image segmentation from the perspective of estimating an ultrametric\ndistance over the set of image pixels that agrees closely with an input set of noisy pairwise distances.\nAn ultrametric space replaces the usual triangle inequality with the ultrametric inequality d(u, v) \u2264\nmax{d(u, w), d(v, w)} which captures the transitive property of clustering (if u and w are in the\nsame cluster and v and w are in the same cluster, then u and v must also be in the same cluster).\nThresholding an ultrametric immediately yields a partition into sets whose diameter is less than\nthe given threshold. Varying this distance threshold naturally produces a hierarchical clustering in\nwhich clusters at high thresholds are composed of clusters at lower thresholds.\nInspired by the approach of [1], our method represents an ultrametric explicitly as a hierarchical\ncollection of segmentations. Determining the appropriate segmentation at a single distance threshold\nis equivalent to \ufb01nding a minimum-weight multicut in a graph with both positive and negative edge\nweights [3, 14, 2, 11, 20, 21, 4, 19, 7]. Finding an ultrametric imposes the additional constraint that\nthese multicuts are hierarchically consistent across different thresholds. We focus on the case where\nthe input distances are speci\ufb01ed by a planar graph. This arises naturally in the domain of image\nsegmentation where elements are pixels or superpixels and distances are de\ufb01ned between neighbors\nand allows us to exploit fast combinatorial algorithms for partitioning planar graphs that yield tighter\nLP relaxations than the local polytope relaxation often used in graphical inference [20].\nThe paper is organized as follows. We \ufb01rst introduce the closest ultrametric problem and the re-\nlation between multicuts and ultrametrics. We then describe an LP relaxation that uses a delayed\ncolumn generation approach and exploits planarity to ef\ufb01ciently \ufb01nd cuts via the classic reduction\nto minimum-weight perfect matching [13, 8, 9, 10]. We apply our algorithm to the task of natural\nimage segmentation and demonstrate that our algorithm converges rapidly and produces optimal or\nnear-optimal solutions in practice.\n\n2 Closest Ultrametric and Multicuts\n\nclose to \u03b8 in the sense that the distortion(cid:80)\n\nLet G = (V, E) be a weighted graph with non-negative edge weights \u03b8 indexed by edges e =\n(u, v) \u2208 E. Our goal is to \ufb01nd an ultrametric distance d(u,v) over vertices of the graph that is\n2 is minimized. We begin by\nreformulating this closest ultrametric problem in terms of \ufb01nding a set of nested multicuts in a family\nof weighted graphs.\n\n(u,v)\u2208E (cid:107)\u03b8(u,v) \u2212 d(u,v)(cid:107)2\n\n1\n\n\fWe specify a partitioning or multicut of the vertices of the graph G into components using a binary\nvector \u00afX \u2208 {0, 1}|E| where \u00afXe = 1 indicates that the edge e = (u, v) is \u201ccut\u201d and that the vertices\nu and v associated with the edge are in separate components of the partition. We use MCUT(G)\nto denote the set of binary indicator vectors \u00afX that represent valid multicuts of the graph G. For\nnotational simplicity, in the remainder of the paper we frequently omit the dependence on G which\nis given as a \ufb01xed input.\nA necessary and suf\ufb01cient condition for an indicator vector \u00afX to de\ufb01ne a valid multicut in G is that\nfor every cycle of edges, if one edge on the cycle is cut then at least one other edge in the cycle must\nalso be cut. Let C denote the set of all cycles in G where each cycle c \u2208 C is a set of edges and\nc \u2212 \u02c6e is the set of edges in cycle c excluding edge \u02c6e. We can express MCUT in terms of these cycle\ninequalities as:\n\nMCUT =\n\n\u00afX \u2208 {0, 1}|E| :\n\n\u00afXe \u2265 \u00afX\u02c6e,\u2200c \u2208 C, \u02c6e \u2208 c\n\n(1)\n\n(cid:40)\n\n(cid:88)\n\ne\u2208c\u2212\u02c6e\n\n(cid:41)\n\nA hierarchical clustering of a graph can be described by a nested collection of multicuts. We denote\nthe space of valid hierarchical partitions with L layers by \u00af\u2126L which we represent by a set of L\nedge-indicator vectors X = ( \u00afX 1, \u00afX 2, \u00afX 3, . . . , \u00afX L) in which any cut edge remains cut at all \ufb01ner\nlayers of the hierarchy.\n\n\u00af\u2126L = {( \u00afX 1, \u00afX 2, . . . \u00afX L) : \u00afX l \u2208 MCUT, \u00afX l \u2265 \u00afX l+1 \u2200l}\n\n(2)\nGiven a valid hierarchical clustering X , an ultrametric d can be speci\ufb01ed over the vertices of the\ngraph by choosing a sequence of real values 0 = \u03b40 < \u03b41 < \u03b42 < . . . < \u03b4L that indicate a distance\nthreshold associated with each level l of the hierarchical clustering. The ultrametric distance d\nspeci\ufb01ed by the pair (X , \u03b4) assigns a distance to each pair of vertices d(u,v) based on the coarsest\nlevel of the clustering at which they remain in separate clusters. For pairs corresponding to an edge\nin the graph (u, v) = e \u2208 E we can write this explicitly in terms of the multicut indicator vectors\nas:\n\n\u03b4l[ \u00afX l\n\ne > \u00afX l+1\n\ne\n\n]\n\n(3)\n\nde = max\n\ne =\n\nl\u2208{0,1,...,L} \u03b4l \u00afX l\ne = 1 and \u00afX L+1\n\ne\n\nl=0\nWe assume by convention that \u00afX 0\n= 0. Pairs (u, v) that do not correspond to an\nedge in the original graph can still be assigned a unique distance based on the coarsest level l at\nwhich they lie in different connected components of the cut speci\ufb01ed by X l.\nTo compute the quality of an ultrametric d with respect to an input set of edge weights \u03b8, we measure\nthe squared L2 difference between the edge weights and the ultrametric distance (cid:107)\u03b8 \u2212 d(cid:107)2\n2. To write\nthis compactly in terms of multicut indicator vectors, we construct a set of weights for each edge\ne = (cid:107)\u03b8e \u2212 \u03b4m(cid:107)2. These weights are given explicitly by the\nand layer, denoted \u03b8l\ntelescoping series:\ne = (cid:107)\u03b8e \u2212 \u03b4l(cid:107)2 \u2212 (cid:107)\u03b8e \u2212 \u03b4l\u22121(cid:107)2 \u2200l > 1\n\u03b8l\n\ne so that(cid:80)m\n\ne = (cid:107)\u03b8e(cid:107)2\n\u03b80\n\nWe use \u03b8l \u2208 R|E| to denote the vector containing \u03b8l\nFor a \ufb01xed number of levels L and \ufb01xed set of thresholds \u03b4, the problem of \ufb01nding the closest\nultrametric d can then be written as an integer linear program (ILP) over the edge cut indicators.\n\nl=0 \u03b8l\n\n(4)\n\nL(cid:88)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\u03b8e \u2212 L(cid:88)\n\nl=0\n\n(cid:88)\n\ne\u2208E\n\nminX\u2208 \u00af\u2126L\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\ne for all e \u2208 E.\nL(cid:88)\n(cid:88)\n(cid:0)(cid:107)\u03b8e \u2212 \u03b4l(cid:107)2 \u2212 (cid:107)\u03b8e \u2212 \u03b4l\u22121(cid:107)2(cid:1) \u00afX l\nL(cid:88)\n\n(cid:107)\u03b8e \u2212 \u03b4l(cid:107)2( \u00afX l\n\ne\u2208E\n\nl=0\n\n\u03b8l \u00b7 \u00afX l\n\n= minX\u2208 \u00af\u2126L\n\nL(cid:88)\n\nl=1\n\ne \u2212 \u00afX l+1\n\ne\n\n)\n\n\u03b4l[ \u00afX l\n\ne > \u00afX l+1\n\ne\n\n]\n\n(cid:107)\u03b8e(cid:107)2 \u00afX 0\n\ne +\n\n(cid:32)\n(cid:88)\n\n(cid:88)\nL(cid:88)\n\ne\u2208E\n\ne\u2208E\n\nl=0\n\n= minX\u2208 \u00af\u2126L\n\n= minX\u2208 \u00af\u2126L\n\n\u03b8l\ne\n\n\u00afX l\ne = minX\u2208 \u00af\u2126L\n\nl=0\n\ne + (cid:107)\u03b8e \u2212 \u03b4L(cid:107)2 \u00afX L+1\n\ne\n\n(5)\n\n(cid:33)\n\n(6)\n\nThis optimization corresponds to solving a collection of minimum-weight multicut problems where\nthe multicuts are constrained to be hierarchically consistent.\n\n2\n\n\f(a) Linear combination of cut vectors\n\n(b) Hierarchical cuts\n\nFigure 1: (a) Any partitioning X can be represented as a linear superposition of cuts Z where\neach cut isolates a connected component of the partition and is assigned a weight \u03b3 = 1\n2 [20]. By\nintroducing an auxiliary slack variables \u03b2, we are able to represent a larger set of valid indicator\nvectors X using fewer columns of Z. (b) By introducing additional slack variables at each layer of\nthe hierarchical segmentation, we can ef\ufb01ciently represent many hierarchical segmentations (here\n{X 1, X 2, X 3}) that are consistent from layer to layer while using only a small number of cut indi-\ncators as columns of Z.\n\nComputing minimum-weight multicuts (also known as correlation clustering) is NP hard even in the\ncase of planar graphs [6]. A direct approach to \ufb01nding an approximate solution to Eq 6 is to relax\nthe integrality constraints on \u00afX l and instead optimize over the whole polytope de\ufb01ned by the set of\ncycle inequalities. We use \u2126L to denote the corresponding relaxation of \u00af\u2126L. While the resulting\npolytope is not the convex hull of MCUT, the integral vertices do correspond exactly to the set of\nvalid multicuts [12].\nIn practice, we found that applying a straightforward cutting-plane approach that successively adds\nviolated cycle inequalities to this relaxation of Eq 6 requires far too many constraints and is too\nslow to be useful. Instead, we develop a column generation approach tailored for planar graphs that\nallows for ef\ufb01cient and accurate approximate inference.\n\n3 The Cut Cone and Planar Multicuts\n\nConsider a partition of a planar graph into two disjoint sets of nodes. We denote the space of\nindicator vectors corresponding to such two-way cuts by CUT. A cut may yield more than two\nconnected components but it can not produce every possible multicut (e.g., it can not split a triangle\nof three nodes into three separate components). Let Z \u2208 {0, 1}|E|\u00d7|CUT| be an indicator matrix\nwhere each column speci\ufb01es a valid two-way cut with Zek = 1 if and only if edge e is cut in two-\nway cut k. The indicator vector of any multicut in a planar graph can be generated by a suitable\nlinear combination of of cuts (columns of Z) that isolate the individual components from the rest of\nthe graph where the weight of each such cut is 1\n2.\nLet \u03b3 \u2208 R|CUT| be a vector specifying a positive weighted combination of cuts. The set CUT\n(cid:52)\n=\n{Z\u03b3 : \u03b3 \u2265 0} is the conic hull of CUT or \u201ccut cone\u201d. Since any multicut can be expressed as a\nsuperposition of cuts, the cut cone is identical to the conic hull of MCUT. This equivalence suggests\nan LP relaxation of the minimum-cost multicut given by\n\nmin\n\u03b3\u22650\n\n(7)\nwhere the vector \u03b8 \u2208 R|E| speci\ufb01es the edge weights. For the case of planar graphs, any solution to\nthis LP relaxation satis\ufb01es the cycle inequalities (see supplement and [12, 18, 10]).\nExpanded Multicut Objective: Since the matrix Z contains an exponential number of cuts, Eq. 7\nis still intractable. Instead we consider an approximation using a constraint set \u02c6Z which is a subset\n\n\u03b8 \u00b7 Z\u03b3\n\ns.t. Z\u03b3 \u2264 1\n\n3\n\n\fof columns of Z. In previous work [20], we showed that since the optimal multicut may no longer\nlie in the span of the reduced cut matrix \u02c6Z, it is useful to allow some values of \u02c6Z\u03b3 exceed 1 (see\nFigure 1(a) for an example).\nWe introduce a slack vector \u03b2 \u2265 0 that tracks the presence of any \u201covercut\u201d edges and prevents\nthem from contributing to the objective when the corresponding edge weight is negative. Let \u03b8\u2212\ne =\nmin(\u03b8e, 0) denote the non-positive component of \u03b8e. The expanded multi-cut objective is given by:\n\n\u03b8 \u00b7 \u02c6Z\u03b3 \u2212 \u03b8\u2212 \u00b7 \u03b2\n\ns.t. \u02c6Z\u03b3 \u2212 \u03b2 \u2264 1\n\n(8)\n\nmin\n\u03b3\u22650\n\u03b2\u22650\n\nFor any edge e such that \u03b8e < 0, any decrease in the objective from overcutting by an amount \u03b2e is\nexactly compensated for in the objective by the term \u2212\u03b8\u2212\nWhen \u02c6Z contains all cuts (i.e., \u02c6Z = Z) then Eq 7 and Eq 8 are equivalent [20]. Further, if \u03b3(cid:63) is the\nminimizer of Eq 8 when \u02c6Z only contains a subset of columns, then the edge indicator vector given\nby X = min(1, \u02c6Z\u03b3(cid:63)) still satis\ufb01es the cycle inequalities (see supplement for details).\n\ne \u03b2e.\n\n4 Expanded LP for Finding the Closest Ultrametric\n\ne) and \u03b8\u2212l\n\ne = min(0, \u03b8l\n\ne = max(0, \u03b8l\n\nTo develop an LP relaxation of the closest ultrametric problem, we replace the multicut problem at\neach layer l with the expanded multicut objective described by Eq 8. We let \u03b3 = {\u03b31, \u03b32, \u03b33 . . . \u03b3L}\nand \u03b2 = {\u03b21, \u03b22, \u03b23 . . . \u03b2L} denote the collection of weights and slacks for the levels of the hierar-\ne) denote the positive and negative components\nchy and let \u03b8+l\nof \u03b8l.\nTo enforce hierarchical consistency between layers, we would like to add the constraint that\nZ\u03b3l+1 \u2264 Z\u03b3l. However, this constraint is too rigid when Z does not include all possible cuts.\nIt is thus computationally useful to introduce an additional slack vector associated with each level l\nand edge e which we denote as \u03b1 = {\u03b11, \u03b12, \u03b13 . . . \u03b1L\u22121}. The introduction of \u03b1l\ne allows for cuts\nrepresented by Z\u03b3l to violate the hierarchical constraint. We modify the objective so that violations\ne . The introduction of \u03b1 allows\nto the original hierarchy constraint are paid for in proportion to \u03b8+l\nus to \ufb01nd valid ultrametrics while using a smaller number of columns of Z to be used than would\notherwise be required (illustrated in Figure 1(b)).\nWe call this relaxed closest ultrametric problem including the slack variable \u03b1 the expanded closest\nultrametric objective, written as:\n\nL(cid:88)\n\nL\u22121(cid:88)\n\n\u03b8l \u00b7 Z\u03b3l +\n\n\u2212\u03b8\u2212l \u00b7 \u03b2l +\n\n\u03b8+l \u00b7 \u03b1l\n\n(9)\n\nL(cid:88)\n\nmin\n\u03b3\u22650\n\u03b2\u22650\n\u03b1\u22650\n\nl=1\n\nl=1\n\nl=1\n\ns.t. Z\u03b3l+1 + \u03b1l+1 \u2264 Z\u03b3l + \u03b1l \u2200l < L\n\nZ\u03b3l \u2212 \u03b2l \u2264 1 \u2200l\n\nwhere by convention we de\ufb01ne \u03b1L = 0 and we have dropped the constant l = 0 term from Eq 6.\nGiven a solution (\u03b1, \u03b2, \u03b3) we can recover a relaxed solution to the closest ultrametric problem (Eq.\n6) over \u2126L by setting X l\ne = min(1, maxm\u2265l (Z\u03b3m)e). In the supplement, we demonstrate that for\nany (\u03b1, \u03b2, \u03b3) that obeys the constraints in Eq 9, this thresholding operation yields a solution X that\nlies in \u2126L and achieves the same or lower objective value.\n\n5 The Dual Objective\n\nWe optimize the dual of the objective in Eq 9 using an ef\ufb01cient column generation approach based\non perfect matching. We introduce two sets of Lagrange multipliers \u03c9 = {\u03c91, \u03c92, \u03c93 . . . \u03c9L\u22121} and\n\u03bb = {\u03bb1, \u03bb2, \u03bb3 . . . \u03bbL} corresponding to the between and within layer constraints respectively. For\n\n4\n\n\fAlgorithm 1 Dual Closest Ultrametric via Cutting Planes\n\n\u02c6Z l \u2190 {} \u2200l,\nwhile residual < 0 do\n\nresidual \u2190 \u2212\u221e\n\n{\u03c9},{\u03bb} \u2190 Solve Eq 10 given \u02c6Z\nresidual = 0\nfor l = 1 : L do\n\nzl \u2190 arg minz\u2208CUT(\u03b8l + \u03bbl + \u03c9l\u22121 \u2212 \u03c9l) \u00b7 z\nresidual \u2190 residual + 3\n{z(1), z(2), . . . , z(M )} \u2190 isocuts(zl)\n\u02c6Z l \u2190 \u02c6Z l \u222a {z(1), z(2), . . . , z(M )}\n\n2 (\u03b8l + \u03bbl + \u03c9l\u22121 \u2212 \u03c9l) \u00b7 zl\n\nend for\nend while\n\nnotational convenience, let \u03c90 = 0. The dual objective can then be written as\n\n\u2212\u03bbl \u00b7 1\n\n(10)\n\nL(cid:88)\n\nl=1\n\nmax\n\n\u03c9\u22650,\u03bb\u22650\n\n\u03b8\u2212l \u2264 \u2212\u03bbl \u2200l\n\u2212 (\u03c9l\u22121 \u2212 \u03c9l) \u2264 \u03b8+l \u2200l\n(\u03b8l + \u03bbl + \u03c9l\u22121 \u2212 \u03c9l) \u00b7 Z \u2265 0 \u2200l\n\nThe dual LP can be interpreted as \ufb01nding a small modi\ufb01cation of the original edge weights \u03b8l so\nthat every possible two-way cut of each resulting graph at level l has non-negative weight. Observe\nthat the introduction of the two slack terms \u03b1 and \u03b2 in the primal problem (Eq 9) results in bounds\non the Lagrange multipliers \u03bb and \u03c9 in the dual problem in Eq 10. In practice these dual constraints\nturn out to be essential for ef\ufb01cient optimization and constitute the core contribution of this paper.\n\n6 Solving the Dual via Cutting Planes\n\nThe chief complexity of the dual LP is contained in the constraints including Z which encodes\nnon-negativity of an exponential number of cuts of the graph represented by the columns of Z. To\ncircumvent the dif\ufb01culty of explicitly enumerating the columns of Z, we employ a cutting plane\nmethod that ef\ufb01ciently searches for additional violated constraints (columns of Z) which are then\nsuccessively added.\nLet \u02c6Z denote the current working set of columns. Our dual optimization algorithm iterates over\nthe following three steps: (1) Solve the dual LP with \u02c6Z, (2) \ufb01nd the most violated constraint of the\nform (\u03b8l + \u03bbl + \u03c9l\u22121 \u2212 \u03c9l) \u00b7 Z \u2265 0 for layer l, (3) Append a column to the matrix \u02c6Z for each\nsuch cut found. We terminate when no violated constraints exist or a computational budget has been\nexceeded.\nFinding Violated Constraints: Identifying columns to add to \u02c6Z is carried out for each layer l\nseparately. Finding the most violated constraint of the full problem corresponds to computing the\nminimum-weight cut of a graph with edge weights \u03b8l + \u03bbl + \u03c9l\u22121 \u2212 \u03c9l. If this cut has non-negative\nweight then all the constraints are satis\ufb01ed, otherwise we add the corresponding cut indicator vector\nas an additional column of Z.\nTo generate a new constraint for layer l based on the current Lagrange multipliers, we solve\n\nzl = arg min\nz\u2208CUT\n\n(\u03b8l\n\ne + \u03bbl\n\ne + \u03c9l\u22121\n\ne \u2212 \u03c9l\n\ne)ze\n\n(11)\n\nand subsequently add the new constraints from all layers to our LP, \u02c6Z \u2190 [ \u02c6Z, z1, z2,\n. . . zL].\nUnlike the multicut problem, \ufb01nding a (two-way) cut in a planar graph can be solved exactly by a\nreduction to minimum-weight perfect matching. This is a classic result that, e.g. provides an exact\nsolution for the ground state of a 2D lattice Ising model without a ferromagnetic \ufb01eld [13, 8, 9, 10]\nin O(N 3\n\n2 log N ) time [15].\n\n(cid:88)\n\ne\u2208E\n\n5\n\n\f(a): The average convergence of the upper (blue) and lower-bounds (red) as a function\nFigure 2:\nof running time. Values plotted are the gap between the bound and the best lower-bound computed\n(at termination) for a given problem instance. This relative gap is averaged over problem instances\nwhich have not yet converged at a given time point. We indicate the percentage of problem instances\nthat have yet to terminate using black bars marking [95, 85, 75, 65, .....5] percent. (b) Histogram of\nthe ratio of closest ultrametric objective values for our algorithm (UM) and the baseline clustering\nproduced by UCM. All ratios were less than 1 showing that in no instances did UM produce a worse\nsolution than UCM\n\nwe can compute the total residual constraint violation over all layers of hierarchy by \u2206 =(cid:80)\n\nComputing a lower bound: At a given iteration, prior to adding a newly generated set of constraints\nl(\u03b8l +\n\u03bbl + \u03c9l\u22121 \u2212 \u03c9l) \u00b7 zl. In the supplement we demonstrate that the value of the dual objective plus\n2 \u2206 is a lower-bound on the relaxed closest ultrametric problem in Eq 9. Thus, as the costs of the\n3\nminimum-weight matchings approach zero from below, the objective of the reduced problem over\n\u02c6Z approaches an accurate lower-bound on optimization over \u00af\u2126L\nExpanding generated cut constraints: When a given cut zl produces more than two connected\ncomponents, we found it useful to add a constraint corresponding to each component, following the\napproach of [20]. Let the number of connected components of zl be denoted M. For each of the\nM components then we add one column to Z corresponding to the cut that isolates that connected\ncomponent from the rest. This allows more \ufb02exibility in representing the \ufb01nal optimum multicut as\nsuperpositions of these components. In addition, we also found it useful in practice to maintain a\nseparate set of constraints \u02c6Z l for each layer l. Maintaining independent constraints \u02c6Z 1, \u02c6Z 2, . . . , \u02c6Z L\ncan result in a smaller overall LP.\nSpeeding convergence of \u03c9: We found that adding an explicit penalty term to the objective that\nencourages small values of \u03c9 speeds up convergence dramatically with no loss in solution quality.\nIn our experiments, this penalty is scaled by a parameter \u0001 = 10\u22124 which is chosen to be extremely\nsmall in magnitude relative to the values of \u03b8 so that it only has an in\ufb02uence when no other \u201cforces\u201d\nare acting on a given term in \u03c9.\nPrimal Decoding: Algorithm 1 gives a summary of the dual solver which produces a lower-bound\nas well as a set of cuts described by the constraint matrices \u02c6Z l. The subroutine isocuts(zl) computes\nthe set of cuts that isolate each connected component of zl. To generate a hierarchical clustering,\nwe solve the primal, Eq 9, using the reduced set \u02c6Z in order to recover a fractional solution X l\ne =\nmin(1, maxm\u2265l( \u02c6Z m\u03b3m)e). We use an LP solver (IBM CPLEX) which provides this primal solution\n\u201cfor free\u201d when solving the dual in Alg. 1.\nWe round the fractional primal solution X to a discrete hierarchical clustering by thresholding:\ne \u2190 [X l\n\u00afX l\ne > t]. We then repair (uncut) any cut edges that lie inside a connected component. In our\nimplementation we test a few discrete thresholds t \u2208 {0, 0.2, 0.4, 0.6, 0.8} and take that threshold\nthat yields \u00afX with the lowest cost. After each pass through the loop of Alg. 1 we compute these\nupper-bounds and retain the optimum solution observed thus far.\n\n6\n\n10010110210310\u2212410\u22122Time (sec)Bound UBLB0.20.40.60.81020406080Objective ratio (UCM / UM)Counts\f(a) Boundary detection performance of our closest ultrametric algorithm (UM) and the\nFigure 3:\nbaseline ultrametric contour maps algorithm with (UCM) and without (UCM-L) length weighting\n[5] on BSDS. Black circles indicate thresholds used in the closest UM optimization. (b) Anytime\nperformance: F-measure on the BSDS benchmark as a function of run-time. UM, UCM with and\nwithout length weighting achieve a maximum F-measure of 0.728, 0.726, and 0.718 respectively.\n\n7 Experiments\n\n(cid:17)\n\n+ log(cid:0) 1\u2212\u0001\n\n\u0001\n\n1\u2212gP be\n\n(cid:16) gP be\n\n(cid:1) The additive offset\n\nWe applied our algorithm to segmenting images from the Berkeley Segmentation Data set (BSDS)\n[16]. We use superpixels generated by performing an oriented watershed transform on the output\nof the global probability of boundary (gPb) edge detector [17] and construct a planar graph whose\nvertices are superpixels with edges connecting neighbors in the image plane whose base distance \u03b8\nis derived from gP b.\nLet gP be be the local estimate of boundary contrast given by averaging the gP b classi\ufb01er output\nover the boundary between a pair of neighboring superpixels. We truncate extreme values to enforce\nthat gP be \u2208 [\u0001, 1 \u2212 \u0001] with \u0001 = 0.001 and set \u03b8e = log\nassures that \u03b8e \u2265 0. In our experiments we use a \ufb01xed set of eleven distance threshold levels {\u03b4l}\nchosen to uniformly span the useful range of threshold values [9.6, 12.6]. Finally, we weighted edges\nproportionally to the length of the corresponding boundary in the image.\nWe performed dual cutting plane iterations until convergence or 2000 seconds had passed. Lower-\nbounds for the BSDS segmentations were on the order of \u2212103 or \u2212104. We terminate when the\ntotal residual is greater than \u22122 \u00d7 10\u22124. All codes were written in MATLAB using the Blossom\nV implementation of minimum-weight perfect matching [15] and the IBM ILOG CPLEX LP solver\nwith default options.\nBaseline: We compare our results with the hierarchical clusterings produced by the Ultrametric\nContour Map (UCM) [5]. UCM performs agglomerative clustering of superpixels and assigns the\nlength-weighted averaged gP b value as the distance between each pair of merged regions. While\nUCM was not explicitly designed to \ufb01nd the closest ultrametric, it provides a strong baseline for\nhierarchical clustering. To compute the closest l-level ultrametric corresponding to the UCM clus-\ntering result, we solve the minimization in Eq. 6 while restricting each multicut to be the partition\nat some level of the UCM hierarchy.\nConvergence and Timing: Figure 2 shows the average behavior of convergence as a function of\nruntime. We found the upper-bound given by the cost of the decoded integer solution and the lower-\nbound estimated by the dual LP are very close. The integrality gap is typically within 0.1% of the\nlower-bound and never more than 1 %. Convergence of the dual is achieved quite rapidly; most\ninstances require less than 100 iterations to converge with roughly linear growth in the size of the\nLP at each iteration as cutting planes are added. In Fig 2 we display a histogram, computed over test\nimage problem instances, of the cost of UCM solutions relative to those produced by closest ultra-\nmetric (UM) estimated by our method. A ratio of less than 1 indicates that our approach generated\na solution with a lower distortion ultrametric. In no problem instance did UCM outperform our UM\nalgorithm.\n\n7\n\n00.20.40.60.810.40.50.60.70.80.91RecallPrecision UCMUCM\u2212LUM1001011021030.30.40.50.60.7Time (sec)Maximum F\u2212measure UMUCM\u2212LUCM\fUM\n\nMC\n\nUM\n\nMC\n\nFigure 4: The proposed closest ultrametric (UM) enforces consistency across levels while per-\nforming independent multi-cut clustering (MC) at each threshold does not guarantee a hierarchical\nsegmentation (c.f. \ufb01rst image, columns 3 and 4). In the second image, hierarchical segmentation\n(UM) better preserves semantic parts of the two birds while correctly merging the background re-\ngions.\n\nSegmentation Quality: Figure 3 shows the segmentation benchmark accuracy of our closest ultra-\nmetric algorithm (denoted UM) along with the baseline ultrametric contour maps algorithm (UCM)\nwith and without length weighting [5]. In terms of segmentation accuracy, UM performs nearly iden-\ntically to the state of the art UCM algorithm with some small gains in the high-precision regime. It\nis worth noting that the BSDS benchmark does not provide strong penalties for small leaks between\ntwo segments when the total number of boundary pixels involved is small. Our algorithm may \ufb01nd\nstrong application in domains where the local boundary signal is noisier (e.g., biological imaging)\nor when under-segmentation is more heavily penalized.\nWhile our cutting-plane approach is slower than agglomerative clustering, it is not necessary to wait\nfor convergence in order to produce high quality results. We found that while the upper and lower\nbounds decrease as a function of time, the clustering performance as measured by precision-recall\nis often nearly optimal after only ten seconds and remains stable. Figure 3 shows a plot of the\nF-measure achieved by UM as a function of time.\nImportance of enforcing hierarchical constraints: Although independently \ufb01nding multicuts at\ndifferent thresholds often produces hierarchical clusterings, this is by no means guaranteed. We ran\nAlgorithm 1 while setting \u03c9l\ne = 0, allowing each layer to be solved independently. Fig 4 shows\nexamples where hierarchical constraints between layers improves segmentation quality relative to\nindependent clustering at each threshold.\n\n8 Conclusion\n\nWe have introduced a new method for approximating the closest ultrametric on planar graphs that\nis applicable to hierarchical image segmentation. Our contribution is a dual cutting plane approach\nthat exploits the introduction of novel slack terms that allow for representing a much larger space of\nsolutions with relatively few cutting planes. This yields an ef\ufb01cient algorithm that provides rigorous\nbounds on the quality the resulting solution. We empirically observe that our algorithm rapidly\nproduces compelling image segmentations along with lower- and upper-bounds that are nearly tight\non the benchmark BSDS test data set.\nAcknowledgements: JY acknowledges the support of Experian, CF acknowledges support of NSF\ngrants IIS-1253538 and DBI-1262547\n\n8\n\n\fReferences\n[1] Nir Ailon and Moses Charikar. Fitting tree metrics: Hierarchical clustering and phylogeny. In\n\nFoundations of Computer Science, 2005., pages 73\u201382, 2005.\n\n[2] Bjoern Andres, Joerg H. Kappes, Thorsten Beier, Ullrich Kothe, and Fred A. Hamprecht. Prob-\nabilistic image segmentation with closedness constraints. In Proc. of ICCV, pages 2611\u20132618,\n2011.\n\n[3] Bjoern Andres, Thorben Kroger, Kevin L. Briggman, Winfried Denk, Natalya Korogod, Gra-\nham Knott, Ullrich Kothe, and Fred. A. Hamprecht. Globally optimal closed-surface segmen-\ntation for connectomics. In Proc. of ECCV, 2012.\n\n[4] Bjoern Andres, Julian Yarkony, B. S. Manjunath, Stephen Kirchhoff, Engin Turetken, Charless\nFowlkes, and Hanspeter P\ufb01ster. Segmenting planar superpixel adjacency graphs w.r.t. non-\nplanar superpixel af\ufb01nity graphs. In Proc. of EMMCVPR, 2013.\n\n[5] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and\nIEEE Trans. Pattern Anal. Mach. Intell., 33(5):898\u2013916,\n\nhierarchical image segmentation.\nMay 2011.\n\n[6] Yoram Bachrach, Pushmeet Kohli, Vladimir Kolmogorov, and Morteza Zadimoghaddam. Op-\n\ntimal coalition structure generation in cooperative graph games. In Proc. of AAAI, 2013.\n\n[7] Shai Bagon and Meirav Galun. Large scale correlation clustering. In CoRR, abs/1112.2903,\n\n2011.\n\n[8] F Barahona. On the computational complexity of ising spin glass models. Journal of Physics\n\nA: Mathematical, Nuclear and General, 15(10):3241\u20133253, april 1982.\n\n[9] F Barahona. On cuts and matchings in planar graphs. Mathematical Programming, 36(2):53\u2013\n\n68, november 1991.\n\n[10] F Barahona and A Mahjoub. On the cut polytope. Mathematical Programming, 60(1-3):157\u2013\n\n173, September 1986.\n\n[11] Thorsten Beier, Thorben Kroeger, Jorg H Kappes, Ullrich Kothe, and Fred A Hamprecht. Cut,\nglue, and cut: A fast, approximate solver for multicut partitioning. In Computer Vision and\nPattern Recognition (CVPR), 2014 IEEE Conference on, pages 73\u201380, 2014.\n\n[12] Michel Deza and Monique Laurent. Geometry of cuts and metrics, volume 15. Springer\n\nScience & Business Media, 1997.\n\n[13] Michael Fisher. On the dimer solution of planar ising models. Journal of Mathematical\n\nPhysics, 7(10):1776\u20131781, 1966.\n\n[14] Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang Dong Yoo. Higher-order\ncorrelation clustering for image segmentation. In Advances in Neural Information Processing\nSystems,25, pages 1530\u20131538, 2011.\n\n[15] Vladimir Kolmogorov. Blossom v: a new implementation of a minimum cost perfect matching\n\nalgorithm. Mathematical Programming Computation, 1(1):43\u201367, 2009.\n\n[16] David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human seg-\nmented natural images and its application to evaluating segmentation algorithms and measuring\necological statistics. In Proc. of ICCV, pages 416\u2013423, 2001.\n\n[17] David Martin, Charless C. Fowlkes, and Jitendra Malik. Learning to detect natural image\nboundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach.\nIntell., 26(5):530\u2013549, May 2004.\n\n[18] Julian Yarkony. Analyzing PlanarCC. NIPS 2014 workshop, 2014.\n[19] Julian Yarkony, Thorsten Beier, Pierre Baldi, and Fred A Hamprecht. Parallel multicut seg-\n\nmentation via dual decomposition. In New Frontiers in Mining Complex Patterns, 2014.\n\n[20] Julian Yarkony, Alexander Ihler, and Charless Fowlkes. Fast planar correlation clustering for\n\nimage segmentation. In Proc. of ECCV, 2012.\n\n[21] Chong Zhang, Julian Yarkony, and Fred A. Hamprecht. Cell detection and segmentation using\n\ncorrelation clustering. In MICCAI, volume 8673, pages 9\u201316, 2014.\n\n9\n\n\f", "award": [], "sourceid": 41, "authors": [{"given_name": "Julian", "family_name": "Yarkony", "institution": "Dr."}, {"given_name": "Charless", "family_name": "Fowlkes", "institution": "UC Irvine"}]}