{"title": "Rounding-based Moves for Metric Labeling", "book": "Advances in Neural Information Processing Systems", "page_first": 109, "page_last": 117, "abstract": "Metric labeling is a special case of energy minimization for pairwise Markov random fields. The energy function consists of arbitrary unary potentials, and pairwise potentials that are proportional to a given metric distance function over the label set. Popular methods for solving metric labeling include (i) move-making algorithms, which iteratively solve a minimum st-cut problem; and (ii) the linear programming (LP) relaxation based approach. In order to convert the fractional solution of the LP relaxation to an integer solution, several randomized rounding procedures have been developed in the literature. We consider a large class of parallel rounding procedures, and design move-making algorithms that closely mimic them. We prove that the multiplicative bound of a move-making algorithm exactly matches the approximation factor of the corresponding rounding procedure for any arbitrary distance function. Our analysis includes all known results for move-making algorithms as special cases.", "full_text": "Rounding-based Moves for Metric Labeling\n\nM. Pawan Kumar\n\nEcole Centrale Paris & INRIA Saclay\n\npawan.kumar@ecp.fr\n\nAbstract\n\nMetric labeling is a special case of energy minimization for pairwise Markov ran-\ndom \ufb01elds. The energy function consists of arbitrary unary potentials, and pair-\nwise potentials that are proportional to a given metric distance function over the\nlabel set. Popular methods for solving metric labeling include (i) move-making\nalgorithms, which iteratively solve a minimum st-cut problem; and (ii) the linear\nprogramming (LP) relaxation based approach. In order to convert the fractional\nsolution of the LP relaxation to an integer solution, several randomized round-\ning procedures have been developed in the literature. We consider a large class\nof parallel rounding procedures, and design move-making algorithms that closely\nmimic them. We prove that the multiplicative bound of a move-making algorithm\nexactly matches the approximation factor of the corresponding rounding proce-\ndure for any arbitrary distance function. Our analysis includes all known results\nfor move-making algorithms as special cases.\n\n1\n\nIntroduction\n\nA Markov random \ufb01eld (MRF) is a graph whose vertices are random variables, and whose edges\nspecify a neighborhood over the random variables. Each random variable can be assigned a value\nfrom a set of labels, resulting in a labeling of the MRF. The putative labelings of an MRF are\nquantitatively distinguished from each other by an energy function, which is the sum of potential\nfunctions that depend on the cliques of the graph. An important optimization problem associate with\nthe MRF framework is energy minimization, that is, \ufb01nding a labeling with the minimum energy.\n\nMetric labeling is a special case of energy minimization, which models several useful low-level\nvision tasks [3, 4, 18]. It is characterized by a \ufb01nite, discrete label set and a metric distance function\nover the labels. The energy function in metric labeling consists of arbitrary unary potentials and\npairwise potentials that are proportional to the distance between the labels assigned to them. The\nproblem is known to be NP-hard [20]. Two popular approaches for metric labeling are: (i) move-\nmaking algorithms [4, 8, 14, 15, 21], which iteratively improve the labeling by solving a minimum\nst-cut problem; and (ii) linear programming (LP) relaxation [5, 13, 17, 22], which is obtained by\ndropping the integral constraints in the corresponding integer programming formulation. Move-\nmaking algorithms are very ef\ufb01cient due to the availability of fast minimum st-cut solvers [2] and\nare very popular in the computer vision community. In contrast, the LP relaxation is signi\ufb01cantly\nslower, despite the development of specialized solvers [7, 9, 11, 12, 16, 19, 22, 23, 24, 25]. However,\nwhen used in conjunction with randomized rounding algorithms, the LP relaxation provides the best\nknown polynomial-time theoretical guarantees for metric labeling [1, 5, 10].\n\nAt \ufb01rst sight, the difference between move-making algorithms and the LP relaxation appears to be\nthe standard accuracy vs. speed trade-off. However, for some special cases of distance functions,\nit has been shown that appropriately designed move-making algorithms can match the theoretical\nguarantees of the LP relaxation [14, 15, 20]. In this paper, we extend this result for a large class\nof randomized rounding procedures, which we call parallel rounding. In particular we prove that\nfor any arbitrary (semi-)metric distance function, there exist move-making algorithms that match\nthe theoretical guarantees provided by parallel rounding. The proofs, the various corollaries of our\n\n1\n\n\ftheorems (which cover all previously known guarantees) and our experimental results are deferred\nto the accompanying technical report.\n\n2 Preliminaries\n\nMetric Labeling. The problem of metric labeling is de\ufb01ned over an undirected graph G =\n(X, E). The vertices X = {X1, X2, \u00b7 \u00b7 \u00b7 , Xn} are random variables, and the edges E specify a\nneighborhood relationship over the random variables. Each random variable can be assigned a value\nfrom the label set L = {l1, l2, \u00b7 \u00b7 \u00b7 , lh}. We assume that we are also provided with a metric distance\nfunction d : L \u00d7 L \u2192 R+ over the labels.\n\nWe refer to an assignment of values to all the random variables as a labeling. In other words, a\nlabeling is a vector x \u2208 Ln, which speci\ufb01es the label xa assigned to each random variable Xa. The\nhn different labelings are quantitatively distinguished from each other by an energy function Q(x),\nwhich is de\ufb01ned as follows:\n\nQ(x) = XXa\u2208X\n\n\u03b8a(xa) + X(Xa,Xb)\u2208E\n\nwabd(xa, xb).\n\nHere, the unary potentials \u03b8a(\u00b7) are arbitrary, and the edge weights wab are non-negative. Metric\nlabeling requires us to \ufb01nd a labeling with the minimum energy. It is known to be NP-hard.\n\nMultiplicative Bound. As metric labeling plays a central role in low-level vision, several approx-\nimate algorithms have been proposed in the literature. A common theoretical measure of accuracy\nfor an approximate algorithm is the multiplicative bound. In this work, we are interested in the\nmultiplicative bound of an algorithm with respect to a distance function. Formally, given a distance\nfunction d, the multiplicative bound of an algorithm is said to be B if the following condition is\nsatis\ufb01ed for all possible values of unary potentials \u03b8a(\u00b7) and non-negative edge weights wab:\n\nXXa\u2208X\n\n\u03b8a(\u02c6xa) + X(Xa,Xb)\u2208E\n\nwabd(\u02c6xa, \u02c6xb) \u2264 XXa\u2208X\n\n\u03b8a(x\u2217\n\na) + B X(Xa,Xb)\u2208E\n\nwabd(x\u2217\n\na, x\u2217\n\nb ).\n\n(1)\n\nHere, \u02c6x is the labeling estimated by the algorithm for the given values of unary potentials and edge\nweights, and x\u2217 is an optimal labeling. Multiplicative bounds are greater than or equal to 1, and are\ninvariant to reparameterizations of the unary potentials. A multiplicative bound B is said to be tight\nif the above inequality holds as an equality for some value of unary potentials and edge weights.\n\nLinear Programming Relaxation. An overcomplete representation of a labeling can be speci\ufb01ed\nusing the following variables: (i) unary variables ya(i) \u2208 {0, 1} for all Xa \u2208 X and li \u2208 L such\nthat ya(i) = 1 if and only if Xa is assigned the label li; and (ii) pairwise variables yab(i, j) \u2208 {0, 1}\nfor all (Xa, Xb) \u2208 E and li, lj \u2208 L such that yab(i, j) = 1 if and only if Xa and Xb are assigned\nlabels li and lj respectively. This allows us to formulate metric labeling as follows:\n\nmin\n\ny\n\nXXa\u2208X Xli\u2208L\n\n\u03b8a(li)ya(i) + X(Xa,Xb)\u2208E Xli,lj \u2208L\n\nwabd(li, lj)yab(i, j),\n\ns.t. Xli\u2208L\nXlj \u2208L\nXli\u2208L\n\nya(i) = 1, \u2200Xa \u2208 X,\n\nyab(i, j) = ya(i), \u2200(Xa, Xb) \u2208 E, li \u2208 L,\n\nyab(i, j) = yb(j), \u2200(Xa, Xb) \u2208 E, lj \u2208 L,\n\nya(i) \u2208 {0, 1}, yab(i, j) \u2208 {0, 1}, \u2200Xa \u2208 X, (Xa, Xb) \u2208 E, li, lj \u2208 L.\n\nBy relaxing the \ufb01nal set of constraints such that the optimization variables can take any value be-\ntween 0 and 1 inclusive, we obtain a linear program (LP). The computational complexity of solving\nthe LP relaxation is polynomial in the size of the problem.\n\nRounding Procedure.\nIn order to prove theoretical guarantees of the LP relaxation, it is common\nto use a rounding procedure that can covert a feasible fractional solution y of the LP relaxation to\na feasible integer solution \u02c6y of the integer linear program. Several rounding procedures have been\n\n2\n\n\fproposed in the literature. In this work, we focus on the randomized parallel rounding procedures\nproposed in [5, 10]. These procedures have the property that, given a fractional solution y, the\nprobability of assigning a label li \u2208 L to a random variable Xa \u2208 X is equal to ya(i), that is,\n\nPr(\u02c6ya(i) = 1) = ya(i).\n\n(2)\n\nWe will describe the various rounding procedures in detail in sections 3-5. For now, we would like\nto note that our reason for focusing on the parallel rounding of [5, 10] is that they provide the best\nknown polynomial-time theoretical guarantees for metric labeling. Speci\ufb01cally, we are interested in\ntheir approximation factor, which is de\ufb01ned next.\n\nApproximation Factor. Given a distance function d, the approximation factor for a rounding pro-\ncedure is said to be F if the following condition is satis\ufb01ed for all feasible fractional solutions y:\n\nE\uf8eb\n\uf8ed Xli,lj \u2208L\n\nd(li, lj)\u02c6ya(i)\u02c6yb(j)\uf8f6\n\uf8f8\n\n\u2264 F Xli,lj \u2208L\n\nd(li, lj)yab(i, j).\n\n(3)\n\nHere, \u02c6y refers to the integer solution, and the expectation is taken with respect to the randomized\nrounding procedure applied to the feasible solution y.\n\nGiven a rounding procedure with an approximation factor of F , an optimal fractional solution y\u2217 of\nthe LP relaxation can be rounded to a labeling \u02c6y that satis\ufb01es the following condition:\n\nE\uf8eb\n\uf8ed XXa\u2208X Xli\u2208L\n\n\u2264 XXa\u2208X Xli\u2208L\n\n\u03b8a(li)y\u2217\n\n\u03b8a(li)\u02c6ya(i) + X(Xa,Xb)\u2208E Xli,lj \u2208L\na(i) + F X(Xa,Xb)\u2208E Xli,lj \u2208L\n\nwabd(li, lj)\u02c6ya(i)\u02c6yb(j)\uf8f6\n\uf8f8\n\nwabd(li, lj)y\u2217\n\nab(i, j).\n\nThe above inequality follows directly from properties (2) and (3). Similar to multiplicative bounds,\napproximation factors are always greater than or equal to 1, and are invariant to reparameterizations\nof the unary potentials. An approximation factor F is said to be tight if the above inequality holds\nas an equality for some value of unary potentials and edge weights.\n\nSubmodular Energy Function. We will use the following important fact throughout this paper.\nGiven an energy function de\ufb01ned using arbitrary unary potentials, non-negative edge weights and a\nsubmodular distance function, an optimal labeling can be computed in polynomial time by solving\nan equivalent minimum st-cut problem [6]. Recall that a submodular distance function d\u2032 over a\nlabel set L = {l1, l2, \u00b7 \u00b7 \u00b7 , lh} satis\ufb01es the following properties: (i) d\u2032(li, lj) \u2265 0 for all li, lj \u2208 L,\nand d\u2032(li, lj) = 0 if and only if i = j; and (ii) d\u2032(li, lj) + d\u2032(li+1, lj+1) \u2264 d\u2032(li, lj+1) + d\u2032(li+1, lj)\nfor all li, lj \u2208 L\\{lh} (where \\ refers to set difference).\n\n3 Complete Rounding and Complete Move\n\nWe start with a simple rounding scheme, which we call complete rounding. While complete round-\ning is not very accurate, it would help illustrate the \ufb02avor of our results. We will subsequently\nconsider its generalizations, which have been useful in obtaining the best-known approximation\nfactors for various special cases of metric labeling.\n\nThe complete rounding procedure consists of a single stage where we use the set of all unary vari-\nables to obtain a labeling (as opposed to other rounding procedures discussed subsequently). Al-\ngorithm 1 describes its main steps. Intuitively, it treats the value of the unary variable ya(i) as the\nprobability of assigning the label li to the random variable Xa. It obtains a labeling by sampling\nfrom all the distributions ya = [ya(i), \u2200li \u2208 L] simultaneously using the same random number.\n\nIt can be shown that using a different random number to sample the distributions ya and yb of\ntwo neighboring random variables (Xa, Xb) \u2208 E results in an in\ufb01nite approximation factor. For\nexample, let ya(i) = yb(i) = 1/h for all li \u2208 L, where h is the number of labels. The pairwise\nvariables yab that minimize the energy function are yab(i, i) = 1/h and yab(i, j) = 0 when i 6= j.\nFor the above feasible solution of the LP relaxation, the RHS of inequality (3) is 0 for any \ufb01nite F ,\nwhile the LHS of inequality (3) is strictly greater than 0 if h > 1. However, we will shortly show that\nusing the same random number r for all random variables provides a \ufb01nite approximation factor.\n\n3\n\n\fAlgorithm 1 The complete rounding procedure.\ninput A feasible solution y of the LP relaxation.\n1: Pick a real number r uniformly from [0, 1].\n2: for all Xa \u2208 X do\n3:\n\nDe\ufb01ne Ya(0) = 0 and Ya(i) = Pi\n\nj=1 ya(j) for all li \u2208 L.\n\nAssign the label li \u2208 L to the random variable Xa if Ya(i \u2212 1) < r \u2264 Ya(i).\n\n4:\n5: end for\n\nWe now turn our attention to designing a move-making algorithm whose multiplicative bound\nmatches the approximation factor of the complete rounding procedure. To this end, we modify\nthe range expansion algorithm proposed in [15] for truncated convex pairwise potentials to a general\n(semi-)metric distance function. Our method, which we refer to as the complete move-making al-\ngorithm, considers all putative labels of all random variables, and provides an approximate solution\nin a single iteration. Algorithm 2 describes its two main steps. First, it computes a submodular\noverestimation of the given distance function by solving the following optimization problem:\n\nd =\n\nargmin\n\nt\n\nd\u2032\n\n(4)\n\ns.t.\n\nd\u2032(li, lj) \u2264 td(li, lj), \u2200li, lj \u2208 L,\nd\u2032(li, lj) \u2265 d(li, lj), \u2200li, lj \u2208 L,\nd\u2032(li, lj) + d\u2032(li+1, lj+1) \u2264 d\u2032(li, lj+1) + d\u2032(li+1, lj), \u2200li, lj \u2208 L\\{lh}.\n\nThe above problem minimizes the maximum ratio of the estimated distance to the original distance\nover all pairs of labels, that is, maxi6=j d\u2032(li, lj)/d(li, lj). We will refer to the optimal value of\nproblem (4) as the submodular distortion of the distance function d. Second, it replaces the original\ndistance function by the submodular overestimation and computes an approximate solution to the\noriginal metric labeling problem by solving a single minimum st-cut problem. Note that, unlike\nthe range expansion algorithm [15] that uses the readily available submodular overestimation of\na truncated convex distance (namely, the corresponding convex distance function), our approach\nestimates the submodular overestimation via the LP (4). Since the LP (4) can be solved for any\narbitrary distance function, it makes complete move-making more generally applicable.\n\nAlgorithm 2 The complete move-making algorithm.\ninput Unary potentials \u03b8a(\u00b7), edge weights wab, distance function d.\n1: Compute a submodular overestimation of d by solving problem (4).\n2: Using the approach of [6], solve the following problem via an equivalent minimum st-cut prob-\n\nlem:\n\n\u02c6x = argmin\n\nx\u2208Ln XXa\u2208X\n\n\u03b8a(xa) + X(Xa,Xb)\u2208E\n\nwabd(xa, xb).\n\nThe following theorem establishes the theoretical guarantees of the complete move-making algo-\nrithm and the complete rounding procedure.\n\nTheorem 1. The tight multiplicative bound of the complete move-making algorithm is equal to the\nsubmodular distortion of the distance function. Furthermore, the tight approximation factor of the\ncomplete rounding procedure is also equal to the submodular distortion of the distance function.\n\nIn terms of computational complexities, complete move-making is signi\ufb01cantly faster than solving\nthe LP relaxation. Speci\ufb01cally, given an MRF with n random variables and m edges, and a label\nset with h labels, the LP relaxation requires at least O(m3h3log(m2h3)) time, since it consists\nof O(mh2) optimization variables and O(mh) constraints.\nIn contrast, complete move-making\nrequires O(nmh3log(m)) time, since the graph constructed using the method of [6] consists of\nO(nh) nodes and O(mh2) arcs. Note that complete move-making also requires us to solve the\nlinear program (4). However, since problem (4) is independent of the unary potentials and the edge\nweights, it only needs to be solved once beforehand in order to compute the approximate solution\nfor any metric labeling problem de\ufb01ned using the distance function d.\n\n4\n\n\f4\n\nInterval Rounding and Interval Moves\n\nTheorem 1 implies that the approximation factor of the complete rounding procedure is very large\nfor distance functions that are highly non-submodular. For example, consider the truncated linear\ndistance function de\ufb01ned as follows over a label set L = {l1, l2, \u00b7 \u00b7 \u00b7 , lh}:\n\nd(li, lj) = min{|i \u2212 j|, M }.\n\nHere, M is a user speci\ufb01ed parameter that determines the maximum distance. The tightest sub-\nmodular overestimation of the above distance function is the linear distance function, that is,\nd(li, lj) = |i \u2212 j|. This implies that the submodular distortion of the truncated linear metric is\n(h \u2212 1)/M , and therefore, the approximation factor for the complete rounding procedure is also\n(h \u2212 1)/M . In order to avoid this large approximation factor, Chekuri et al. [5] proposed an interval\nrounding procedure, which captures the intuition that it is bene\ufb01cial to assign similar labels to as\nmany random variables as possible.\n\nAlgorithm 3 provides a description of interval rounding. The rounding procedure chooses an interval\nof at most q consecutive labels (step 2). It generates a random number r (step 3), and uses it to\nattempt to assign labels to previously unlabeled random variables from the selected interval (steps\n4-7). It can be shown that the overall procedure converges in a polynomial number of iterations with\na probability of 1 [5]. Note that if we \ufb01x q = h and z = 1, interval rounding becomes equivalent\nto complete rounding. However, the analyses in [5, 10] shows that other values of q provide better\napproximation factors for various special cases.\n\nAlgorithm 3 The interval rounding procedure.\ninput A feasible solution y of the LP relaxation.\n\nPick an integer z uniformly from [\u2212q + 2, h]. De\ufb01ne an interval of labels I = {ls, \u00b7 \u00b7 \u00b7 , le},\nwhere s = max{z, 1} is the start index and e = min{z + q \u2212 1, h} is the end index.\nPick a real number r uniformly from [0, 1].\nfor all Unlabeled random variables Xa do\n\nya(j) for all i \u2208 {1, \u00b7 \u00b7 \u00b7 , e \u2212 s + 1}.\n\n1: repeat\n2:\n\n3:\n4:\n5:\n\nDe\ufb01ne Ya(0) = 0 and Ya(i) = Ps+i\u22121\n\nj=s\n\nAssign the label ls+i\u22121 \u2208 I to the Xa if Ya(i \u2212 1) < r \u2264 Ya(i).\n\n6:\n7:\n8: until All random variables have been assigned a label.\n\nend for\n\nOur goal is to design a move-making algorithm whose multiplicative bound matches the approxima-\ntion factor of interval rounding for any choice of q. To this end, we propose the interval move-making\nalgorithm that generalizes the range expansion algorithm [15], originally proposed for truncated con-\nvex distances, to arbitrary distance functions. Algorithm 4 provides its main steps. The central idea\nof the method is to improve a given labeling \u02c6x by allowing each random variable Xa to either retain\nits current label \u02c6xa or to choose a new label from an interval of consecutive labels. In more detail, let\nI = {ls, \u00b7 \u00b7 \u00b7 , le} \u2286 L be an interval of labels of length at most q (step 4). For the sake of simplicity,\n\nlet us assume that \u02c6xa /\u2208 I for any random variable Xa. We de\ufb01ne Ia = IS{\u02c6xa} (step 5). For each\npair of neighboring random variables (Xa, Xb) \u2208 E, we compute a submodular distance function\nd\u02c6xa,\u02c6xb : Ia \u00d7 Ib \u2192 R+ by solving the following linear program (step 6):\n\nd\u02c6xa,\u02c6xb =\n\nargmin\n\nt\n\nd\u2032\n\n(5)\n\ns.t.\n\nd\u2032(li, lj) \u2264 td(li, lj), \u2200li \u2208 Ia, lj \u2208 Ib,\nd\u2032(li, lj) \u2265 d(li, lj), \u2200li \u2208 Ia, lj \u2208 Ib,\nd\u2032(li, lj) + d\u2032(li+1, lj+1) \u2264 d\u2032(li, lj+1) + d\u2032(li+1, lj), \u2200li, lj \u2208 I\\{le},\nd\u2032(li, le) + d\u2032(li+1, \u02c6xb) \u2264 d\u2032(li, \u02c6xb) + d\u2032(li+1, le), \u2200li \u2208 I\\{le},\nd\u2032(le, lj) + d\u2032(\u02c6xa, lj+1) \u2264 d\u2032(le, lj+1) + d\u2032(\u02c6xa, lj), \u2200lj \u2208 I\\{le},\nd\u2032(le, le) + d(\u02c6xa, \u02c6xb) \u2264 d\u2032(le, \u02c6xb) + d\u2032(\u02c6xa, le).\n\nSimilar to problem (4), the above problem minimizes the maximum ratio of the estimated distance\nto the original distance. However, instead of introducing constraints for all pairs of labels, it is only\n\n5\n\n\fconsiders pairs of labels li and lj where li \u2208 Ia and lj \u2208 Ib. Furthermore, it does not modify the\ndistance between the current labels \u02c6xa and \u02c6xb (as can be seen in the last constraint of problem (5)).\n\nGiven the submodular distance functions d\u02c6xa,\u02c6xb , we can compute a new labeling x by solving the\nfollowing optimization problem via minimum st-cut using the method of [6] (step 7):\n\nx =\n\nargmin\n\nx XXa\u2208X\n\n\u03b8a(xa) + X(Xa,Xb)\u2208E\n\nwabd\u02c6xa,\u02c6xb (xa, xb)\n\ns.t.\n\nxa \u2208 Ia, \u2200Xa \u2208 X.\n\n(6)\n\nIf the energy of the new labeling x is less than that of the current labeling \u02c6x, then we update our\nlabeling to x (steps 8-10). Otherwise, we retain the current estimate of the labeling and consider\nanother interval. The algorithm converges when the energy does not decrease for any interval of\nlength at most q. Note that, once again, the main difference between interval move-making and the\nrange expansion algorithm is the use of an appropriate optimization problem, namely the LP (5), to\nobtain a submodular overestimation of the given distance function. This allows us to use interval\nmove-making for the general metric labeling problem, instead of focusing on only truncated convex\nmodels.\n\nAlgorithm 4 The interval move-making algorithm.\ninput Unary potentials \u03b8a(\u00b7), edge weights wab, distance function d, initial labeling x0.\n1: Set current labeling to initial labeling, that is, \u02c6x = x0.\n2: repeat\n3:\n4:\n\nfor all z \u2208 [\u2212q + 2, h] do\n\nDe\ufb01ne an interval of labels I = {ls, \u00b7 \u00b7 \u00b7 , le}, where s = max{z, 1} is the start index and\ne = min{z + q \u2212 1, h} is the end index.\n\n5:\n\n6:\n\nDe\ufb01ne Ia = IS{\u02c6xa} for all random variables Xa \u2208 X.\n\nObtain submodular overestimates d\u02c6xa,\u02c6xb for each pair of neighboring random variables\n(Xa, Xb) \u2208 E by solving problem (5).\nObtain a new labeling x by solving problem (6).\nif Energy of x is less than energy of \u02c6x then\n\n7:\n8:\n9:\n10:\n11:\n12: until Energy cannot be decreased further.\n\nend if\nend for\n\nUpdate \u02c6x = x.\n\nThe following theorem establishes the theoretical guarantees of the interval move-making algorithm\nand the interval rounding procedure.\n\nTheorem 2. The tight multiplicative bound of the interval move-making algorithm is equal to the\ntight approximation factor of the interval rounding procedure.\n\nAn interval move-making algorithm that uses an interval length of q runs for at most O(h/q) itera-\ntions. This follows from a simple modi\ufb01cation of the result by Gupta and Tardos [8] (speci\ufb01cally,\ntheorem 3.7). Hence, the total time complexity of interval move-making is O(nmhq2log(m)),\nsince each iteration solves a minimum st-cut problem of a graph with O(nq) nodes and O(mq2)\narcs.\nIn other words, interval move-making is at most as computationally complex as complete\nmove-making, which in turn is signi\ufb01cantly less complex than solving the LP relaxation. Note that\nproblem (5), which is required for interval move-making, is independent of the unary potentials\nand the edge weights. Hence, it only needs to be solved once beforehand for all pairs of labels\n(\u02c6xa, \u02c6xb) \u2208 L \u00d7 L in order to obtain a solution for any metric labeling problem de\ufb01ned using the\ndistance function d.\n\n5 Hierarchical Rounding and Hierarchical Moves\n\nWe now consider the most general form of parallel rounding that has been proposed in the literature,\nnamely the hierarchical rounding procedure [10]. The rounding relies on a hierarchical clustering\nof the labels. Formally, we denote a hierarchical clustering of m levels for the label set L by C =\n{C(i), i = 1, \u00b7 \u00b7 \u00b7 , m}. At each level i, the clustering C(i) = {C(i, j) \u2286 L, j = 1, \u00b7 \u00b7 \u00b7 , hi} is\n\n6\n\n\fmutually exclusive and collectively exhaustive, that is,\n\nC(i, j) = L, C(i, j) \u2229 C(i, j\u2032) = \u2205, \u2200j 6= j\u2032.\n\n[j\n\nFurthermore, for each cluster C(i, j) at the level i > 2, there exists a unique cluster C(i \u2212 1, j\u2032) in\nthe level i \u2212 1 such that C(i, j) \u2286 C(i \u2212 1, j\u2032). We call the cluster C(i \u2212 1, j\u2032) the parent of the\ncluster C(i, j) and de\ufb01ne p(i, j) = j\u2032. Similarly, we call C(i, j) a child of C(i \u2212 1, j\u2032). Without\nloss of generality, we assume that there exists a single cluster at level 1 that contains all the labels,\nand that each cluster at level m contains a single label.\n\nAlgorithm 5 The hierarchical rounding procedure.\ninput A feasible solution y of the LP relaxation.\n1: De\ufb01ne f 1\n2: for all i \u2208 {2, \u00b7 \u00b7 \u00b7 , m} do\n3:\n4:\n\na = 1 for all Xa \u2208 X.\n\nfor all Xa \u2208 X do\n\nDe\ufb01ne zi\n\na(j) for all j \u2208 {1, \u00b7 \u00b7 \u00b7 , hi} as follows:\na(j) = (cid:26) Pk,lk\u2208C(i,j) ya(k)\na(j) for all j \u2208 {1, \u00b7 \u00b7 \u00b7 , hi} as follows:\n\nzi\n\n0\n\nDe\ufb01ne yi\n\nif p(i, j) = f i\u22121\notherwise.\n\na\n\n,\n\nyi\na(j) =\n\nzi\na(j)\nj \u2032=1 zi\n\na(j\u2032)\n\nPhi\n\nend for\nUsing a rounding procedure (complete or interval) on yi = [yi\n{1, \u00b7 \u00b7 \u00b7 , hi}], obtain an integer solution \u02c6yi.\nfor all Xa \u2208 X do\n\nLet ka \u2208 {1, \u00b7 \u00b7 \u00b7 , hi} such that \u02c6yi(ka) = 1. De\ufb01ne f i\n\na = ka.\n\na(j), \u2200Xa \u2208 X, j \u2208\n\n5:\n\n6:\n7:\n\nend for\n\n8:\n9:\n10:\n11: end for\n12: for all Xa \u2208 X do\n13:\n14: end for\n\nLet lk be the unique label present in the cluster C(m, f m\n\na ). Assign lk to Xa.\n\nAlgorithm 5 describes the hierarchical rounding procedure. Given a clustering C, it proceeds in a\ntop-down fashion through the hierarchy while assigning each random variable to a cluster in the\ncurrent level. Let f i\na be the index of the cluster assigned to the random variable Xa in the level\ni. In the \ufb01rst step, the rounding procedure assigns all the random variables to the unique cluster\nC(1, 1) (step 1). At each step i, it assigns each random variable to a unique cluster in the level i\nby computing a conditional probability distribution as follows. The conditional probability yi\na(j)\n\n(steps 3-6). The conditional probability yi\n\nof assigning the random variable Xa to the cluster C(i, j) is proportional to Plk\u2208C(i,j) ya(k) if\n\np(i, j) = f i\u22121\n, that is, a\nrandom variable cannot be assigned to a cluster C(i, j) if it wasn\u2019t assigned to its parent in the\nprevious step. Using a rounding procedure (complete or interval) for yi, we obtain an assignment\nof random variables to the clusters at level i (step 7). Once such an assignment is obtained, the\nvalues f i\na are computed for all random variables Xa (steps 8-10). At the end of step m, hierarchical\nrounding would have assigned each random variable to a unique cluster in the level m. Since each\ncluster at level m consists of a single label, this provides us with a labeling of the MRF (steps 12-14).\n\na(j) = 0 if p(i, j) 6= f i\u22121\n\na\n\na\n\nOur goal is to design a move-making algorithm whose multiplicative bound matches the approxi-\nmation factor of the hierarchical rounding procedure for any choice of hierarchical clustering C. To\nthis end, we propose the hierarchical move-making algorithm, which extends the hierarchical graph\ncuts approach for hierarchically well-separated tree (HST) metrics proposed in [14]. Algorithm 6\nprovides its main steps. In contrast to hierarchical rounding, the move-making algorithm traverses\nthe hierarchy in a bottom-up fashion while computing a labeling for each cluster in the current level.\nLet xi,j be the labeling corresponding to the cluster C(i, j). At the \ufb01rst step, when considering the\nlevel m of the clustering, all the random variables are assigned the same label. Speci\ufb01cally, xm,j\n\na\n\n7\n\n\fAlgorithm 6 The hierarchical move-making algorithm.\ninput Unary potentials \u03b8a(\u00b7), edge weights wab, distance function d.\n1: for all j \u2208 {1, \u00b7 \u00b7 \u00b7 , h} do\n2:\n3: end for\n4: for all i \u2208 {2, \u00b7 \u00b7 \u00b7 , m} do\n5:\n\nLet lk be the unique label is the cluster C(m, j). De\ufb01ne xm,j\n\nfor all j \u2208 {1, \u00b7 \u00b7 \u00b7 , hm\u2212i+1} do\n\na = lk for all Xa \u2208 X.\n\n6:\n7:\n\na\n\nDe\ufb01ne Lm\u2212i+1,j\nUsing a move-making algorithm (complete or interval), compute the labeling xm\u2212i+1,j\nunder the constraint xm\u2212i+1,j\n\n, p(m \u2212 i + 2, j\u2032) = j, j\u2032 \u2208 {1, \u00b7 \u00b7 \u00b7 , hm\u2212i+2}}.\n\n\u2208 Lm\u2212i+1,j\n\n= {xm\u2212i+2,j \u2032\n\na\n\na\n\na\n\n.\n\nend for\n\n8:\n9: end for\n10: The \ufb01nal solution is x1,1.\n\nis equal to the unique label contained in the cluster C(m, j) (steps 1-3). At step i, it computes the\nlabeling xm\u2212i+1,j for each cluster C(m \u2212 i + 1, j) by using the labelings computed in the previous\nstep. Speci\ufb01cally, it restricts the label assigned to a random variable Xa in the labeling xm\u2212i+1,j\nto the subset of labels that were assigned to it by the labelings corresponding to the children of\nC(m \u2212 i + 1, j) (step 6). Under this restriction, the labeling xm\u2212i+1,j is computed by approxi-\nmately minimizing the energy using a move-making algorithm (step 7). Implicit in our description\nis the assumption that that we will use a move-making algorithm (complete or interval) in step 7 of\nAlgorithm 6 whose multiplicative bound matches the approximation factor of the rounding proce-\ndure (complete or interval) used in step 7 of Algorithm 5. Note that, unlike the hierarchical graph\ncuts approach [14], the hierarchical move-making algorithm can be used for any arbitrary clustering\nand not just the one speci\ufb01ed by an HST metric.\n\nThe following theorem establishes the theoretical guarantees of the hierarchical move-making algo-\nrithm and the hierarchical rounding procedure.\n\nTheorem 3. The tight multiplicative bound of the hierarchical move-making algorithm is equal to\nthe tight approximation factor of the hierarchical rounding procedure.\n\nNote that hierarchical move-making solves a series of problems de\ufb01ned on a smaller label set. Since\nthe complexity of complete and interval move-making is superlinear in the number of labels, it can\nbe veri\ufb01ed that the hierarchical move-making algorithm is at most as computationally complex as\nthe complete move-making algorithm (corresponding to the case when the clustering consists of\nonly one cluster that contains all the labels). Hence, hierarchical move-making is signi\ufb01cantly faster\nthan solving the LP relaxation.\n\n6 Discussion\n\nFor any general distance function that can be used to specify the (semi-)metric labeling problem, we\nproved that the approximation factor of a large family of parallel rounding procedures is matched by\nthe multiplicative bound of move-making algorithms. This generalizes previously known results on\nthe guarantees of move-making algorithms in two ways: (i) in contrast to previous results [14, 15, 20]\nthat focused on special cases of distance functions, our results are applicable to arbitrary semi-metric\ndistance functions; and (ii) the guarantees provided by our theorems are tight. Our experiments\n(described in the technical report) con\ufb01rm that the rounding-based move-making algorithms provide\nsimilar accuracy to the LP relaxation, while being signi\ufb01cantly faster due to the use of ef\ufb01cient\nminimum st-cut solvers.\n\nSeveral natural questions arise. What is the exact characterization of the rounding procedures for\nwhich it is possible to design matching move-making algorithms? Can we design rounding-based\nmove-making algorithms for other combinatorial optimization problems? Answering these ques-\ntions will not only expand our theoretical understanding, but also result in the development of ef\ufb01-\ncient and accurate algorithms.\n\nAcknowledgements. This work is funded by the European Research Council under the Euro-\npean Community\u2019s Seventh Framework Programme (FP7/2007-2013)/ERC Grant agreement num-\nber 259112.\n\n8\n\n\fReferences\n\n[1] A. Archer, J. Fakcharoenphol, C. Harrelson, R. Krauthgamer, K. Talvar, and E. Tardos. Ap-\n\nproximate classi\ufb01cation via earthmover metrics. In SODA, 2004.\n\n[2] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-\ufb02ow algorithms\n\nfor energy minimization in vision. PAMI, 2004.\n\n[3] Y. Boykov, O. Veksler, and R. Zabih. Markov random \ufb01elds with ef\ufb01cient approximations. In\n\nCVPR, 1998.\n\n[4] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. In\n\nICCV, 1999.\n\n[5] C. Chekuri, S. Khanna, J. Naor, and L. Zosin. Approximation algorithms for the metric labeling\n\nproblem via a new linear programming formulation. In SODA, 2001.\n\n[6] B. Flach and D. Schlesinger. Transforming an arbitrary minsum problem into a binary one.\n\nTechnical report, TU Dresden, 2006.\n\n[7] A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms\n\nfor MAP LP-relaxations. In NIPS, 2007.\n\n[8] A. Gupta and E. Tardos. A constant factor approximation algorithm for a class of classi\ufb01cation\n\nproblems. In STOC, 2000.\n\n[9] T. Hazan and A. Shashua. Convergent message-passing algorithms for inference over general\n\ngraphs with convex free energy. In UAI, 2008.\n\n[10] J. Kleinberg and E. Tardos. Approximation algorithms for classi\ufb01cation problems with pair-\n\nwise relationships: Metric labeling and Markov random \ufb01elds. In STOC, 1999.\n\n[11] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. PAMI,\n\n2006.\n\n[12] N. Komodakis, N. Paragios, and G. Tziritas. MRF optimization via dual decomposition:\n\nMessage-passing revisited. In ICCV, 2007.\n\n[13] A. Koster, C. van Hoesel, and A. Kolen. The partial constraint satisfaction problem: Facets\n\nand lifting theorems. Operations Research Letters, 1998.\n\n[14] M. P. Kumar and D. Koller. MAP estimation of semi-metric MRFs via hierarchical graph cuts.\n\nIn UAI, 2009.\n\n[15] M. P. Kumar and P. Torr. Improved moves for truncated convex models. In NIPS, 2008.\n\n[16] P. Ravikumar, A. Agarwal, and M. Wainwright. Message-passing for graph-structured linear\n\nprograms: Proximal projections, convergence and rounding schemes. In ICML, 2008.\n\n[17] M. Schlesinger. Syntactic analysis of two-dimensional visual signals in noisy conditions.\n\nKibernetika, 1976.\n\n[18] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and\nC. Rother. A comparative study of energy minimization methods for Markov random \ufb01elds\nwith smoothness-based priors. PAMI, 2008.\n\n[19] D. Tarlow, D. Batra, P. Kohli, and V. Kolmogorov. Dynamic tree block coordinate ascent. In\n\nICML, 2011.\n\n[20] O. Veksler. Ef\ufb01cient graph-based energy minimization methods in computer vision. PhD thesis,\n\nCornell University, 1999.\n\n[21] O. Veksler. Graph cut based optimization for MRFs with truncated convex priors. In CVPR,\n\n2007.\n\n[22] M. Wainwright, T. Jaakkola, and A. Willsky. MAP estimation via agreement on trees: Message\n\npassing and linear programming. Transactions on Information Theory, 2005.\n\n[23] Y. Weiss, C. Yanover, and T. Meltzer. MAP estimation, linear programming and belief propa-\n\ngation with convex free energies. In UAI, 2007.\n\n[24] T. Werner. A linear programming approach to max-sum problem: A review. PAMI, 2007.\n\n[25] T. Werner. Revisting the linear programming relaxation approach to Gibbs energy minimiza-\n\ntion and weighted constraint satisfaction. PAMI, 2010.\n\n9\n\n\f", "award": [], "sourceid": 103, "authors": [{"given_name": "M. Pawan", "family_name": "Kumar", "institution": "Ecole Centrale Paris"}]}