{"title": "MAP estimation in Binary MRFs via Bipartite Multi-cuts", "book": "Advances in Neural Information Processing Systems", "page_first": 955, "page_last": 963, "abstract": "", "full_text": "MAP estimation in Binary MRFs via Bipartite Multi-cuts\r\nSashank J. Reddi IIT Bombay sashank@cse.iitb.ac.in Sunita Sarawagi IIT Bombay sunita@cse.iitb.ac.in Sundar Vishwanathan IIT Bombay sundar@cse.iitb.ac.in\r\n\r\nAbstract\r\nWe propose a new LP relaxation for obtaining the MAP assignment of a binary MRF with pairwise potentials. Our relaxation is derived from reducing the MAP assignment problem to an instance of a recently proposed Bipartite Multi-cut problem where the LP relaxation is guaranteed to provide an O(log k) approximation where k is the number of vertices adjacent to non-submodular edges in the MRF. We then propose a combinatorial algorithm to efficiently solve the LP and also provide a lower bound by concurrently solving its dual to within an approximation. The algorithm is up to an order of magnitude faster and provides better MAP scores and bounds than the state of the art message passing algorithm of [1] that tightens the local marginal polytope with third-order marginal constraints.\r\n\r\n1\r\n\r\nIntroduction\r\n\r\nWe consider pairwise Markov Random Field (MRF) over n binary variables x = x1 , . . . , xn expressed as a graph G = (V, E) and an energy function E(x|) whose parameters decompose over its vertices and edges as: E(x|) =\r\niV \r\n\r\ni (xi ) +\r\n(i,j)E\r\n\r\nij (xi , xj ) + const\r\n\r\n(1)\r\n\r\nOur goal is to find a x = argminx{0,1}n E(x|). This is called the MAP assignment problem in graphical models and for general graphs and arbitrary parameters is NP complete. Consequently, there is an extensive literature of approximation schemes for the problem and new algorithms continue to be explored [2, 3, 4, 5, 6, 7, 8]. The most popular of these are based on the following linear programming relaxation of the MAP problem. min\r\n i,xi\r\n\r\ni (xi )i (xi ) +\r\n(i,j),xi ,xj\r\n\r\nij (xi , xj )ij (xi , xj ) (2)\r\n\r\nij (xi , xj ) = i (xi ) (i, j) E, xi {0, 1}\r\nxj\r\n\r\ni (xi ) = 1 i V, ij (xi , xj ) 0 (i, j) E, xi , xj {0, 1}\r\nxi\r\n\r\nBroadly two main techniques are used to solve this relaxation: message-passing algorithms [9, 10, 11, 7, 12] such as TRW-S and Max-sum diffusion on the dual and, combinatorial algorithms based on graph cuts and network flows [13, 14]. Both these methods find the exact MAP when the edge parameters are submodular. For non-submodular parameters, these methods provide partial optimality guarantees for variables that get integral values. This observation is exploited in [14] to design\r\n\r\n\r\nThe author is currently affiliated with Google Inc.\r\n\r\n1\r\n\r\n\fan iterative probing scheme to expand the set of variables with optimal assignments. However, this scheme is useful only for the case when the graphical model has a few non-submodular edges. More principled methods to improve the solution output by the relaxed LP are based on progressively tightening the relaxation with violated constraints. Cycle constraints [15, 16, 17, 18, 1, 19] and higher order marginal constraints [17, 1, 20] are two such types of constraints. However, these are not backed by efficient algorithms and thus most of these tightenings come at a considerable computational cost. In this paper we propose a new relaxation of the MAP estimation problem via reduction to a recently proposed Bipartite Multi-cut problem in undirected graphs [21]. We exploit this to show that after adding a polynomial number of constraints, we get a O(log k) approximation guarantee on the MAP objective where k is the number of variables adjacent to non-submodular edges in the graphical model, and this can be tightened to O( log(k) log(log(k))) using a semi-definite programming relaxation1 . In this paper we explore only LP-based relaxation since our goal is to design practical algorithms. We propose a combinatorial algorithm to efficiently solve this LP by casting it as a Multi-cut problem on a specially constructed graph, the dual of which is a multi-commodity flow problem. The algorithm, adapted from [22, 23], simultaneously updates the primal and dual solutions, and thus at any point provides both a candidate solution and a lower bound to the energy function. It is guaranteed to provide an - approximate solution of the primal LP in O( -2 (|V|+|E|)2 ) time but in practice terminates much faster. No such guarantees exist for any of the existing algorithms for tightening the MAP LP based on cycle or higher order marginals constraints. Empirically, this algorithm is an order of magnitude faster than the state of the art message passing algorithm[1] while yielding the same or better MAP values and bounds. We show that our LP is a relaxation of the LP with cycle constraints, but we still yield better and faster bounds because our combinatorial algorithm solves the LP within a guaranteed approximation.\r\n\r\n2\r\n\r\nMAP estimation as Bipartite Multi-cut\r\n\r\nWe assume a reparameterization of the energy function so that the parameters of E(x|) (Equation 1) are 1. Symmetric, that is for {xi , xj } {0, 1} ij (xi , xj ) = ij (xi , xj ) where xi = 1 - xi , 2. Zero-normalized, that is min i (xi ) = 0 and min ij (xi , xj ) = 0.\r\nxi xi ,xj 2\r\n\r\nIt is easy to see that any energy function over binary variables can be reparameterized in this form2 . Our starting point is the LP relaxation proposed in [13] for approximating MAP x = argminx E(x|) as the minimum s-t cut in a suitably constructed graph H = (V H , E H ). We present this construction for completeness. 2.1 Graph cut-based relaxation of [13]\r\n\r\nFor ease of notation, first augment the n variables with a special \"0\" variable that always takes a label of 0 and has an edge to all n variables. This enables us to redefine the node parameters i (xi ) as edge parameters 0i (0, xi ). Add to H two vertices i0 and i1 for each variable i, 0 i n. For each edge (i, j) E, add an edge between i0 and j0 with weight ij (0, 1) if the edge is submodular, else add edge (i0 , j1 ) with weight ij (0, 0). For every vertex i, if i (1) is non-zero add an edge between 00 and i0 with weight i (1) else add edge between 01 and i0 with weight i (0). It is easy to see that the MAP problem minx{0,1}n E(x) is equivalent to solving the following program if all\r\n1 We note however that these multiplicative bounds may not be relevant for MAP estimation problem in graphical models where reparameterization leaves behind negative constants which are kept outside the LP objective. 2 Set: ij (0, 0) = ij (1,P = (ij (0, 0) + ij (1, 1))/2, ij (0, 1) = ij (1, 0) = (ij (0, 1) + 1) ij (1, 0))/2, i (1) = i (1) + (i,j)E (ij (1, 0) + ij (1, 1) - ij (0, 1) - ij (0, 0))/2, const = const + P (ij (0, 0) - ij (1, 1))/2. Then zero normalize as in [9]. (i,j)E\r\n\r\n2\r\n\r\n\fvariables are further constrained to take integral values (with D(i0 ) xi ).\r\n\r\nmin\r\nde ,D(.) eE H\r\n\r\nwe de\r\n\r\nde + D(is ) - D(jt ) 0 e = (is , jt ) E H de + D(jt ) - D(is ) 0 e = (is , jt ) E H D(00 ) = 0 D(is ) [0, 1] is V H de [0, 1] e E H D(i0 ) + D(i1 ) = 1 i {0, . . . , n}\r\n\r\n(Min-cut LP)\r\n\r\nAn efficient way to solve this LP exactly is by finding a s-t Min-cut in H with (s t) as (00 , 01 ) and setting D(i0 ) = 1/2 when both i0 and i1 fall on the same side otherwise setting it to 0 or 1 depending on whether i0 or i1 are in the 00 side [13, 14]. It is easy to see that this LP is equivalent to the basic LP relaxation in Equation 2 for which many alternative algorithms have been proposed [3, 6, 7, 9, 11]. On graphs with many cycles containing an odd number of non-submodular edges, this method yields poor MAP assignments. We next show how to tighten this LP based on a connection to a recently proposed Bipartite Multi-cut problem [21].\r\n\r\n2.2\r\n\r\nBipartite Multi-cut based LP relaxation\r\n\r\nThe Bipartite Multi-cut (BMC) problem is a generalization of the standard s-t Min-cut problem. Given an undirected graph J = (N , A) with non-negative edge weights, the s-t Min-cut problem finds the subset of edges with minimum total weight, whose deletion disconnects s and t. In BMC, we are given k source-sink pairs ST = {(s1 , t1 ) . . . (sk , tk )}, and the goal is find a subset of vertices M N such that | {si , ti } M |= 1 and the total weight of edges from M to the remaining vertices N - M is minimized. The BMC problem was recently proposed in [21] where it was shown to be NP-hard and O(log k) approximable using a linear programming relaxation. The BMC problem is also related to the more popular Multi-cut problem where the goal is to identify the smallest weight set of edges such that every si and ti are separated. Any feasible BMC solution is a solution to Multi-cut but not the other way round. To see this, consider a graph over six vertices (s1 , s2 , s3 , t1 , t2 , t3 ) and three edges (s1 , s3 ), (t1 , t2 ), (s2 , t3 ). If ST = {(si , ti ) : 1 i 3}, then all pairs in ST are separated and optimal Multi-cut solution has cost 0. But, for BMC one of the three edges has to be cut. The LP relaxations for Multi-cut provide only a (k) approximation to the BMC problem. We reduce the MAP estimation problem to the Bipartite Multi-cut problem on an optimized version of graph H constructed so that the set of variables R adjacent to non-submodular edges is minimized. Later in Section 2.3 we will show how to create such an optimized graph. Without loss of generality, we assume that the variables in R are 0, 1, . . . , k. The remaining variables j V - R do not need the j1 copy of j in H since there have no edges adjacent to j1 . We create an instance of a Bipartite Multi-cut problem on H with the source-sink pairs ST = {(i0 , i1 ) : 0 i k}. Let M be the subset of vertices output by BMC on this graph, and without loss of generality assume that M contains 00 . The MAP labeling x is obtained from M by setting xi = s if is M and xi = s if is V H - M . This gives a valid MAP labeling because for each variable j that appears in the set R, BMC ensures that M contains exactly one of (j0 , j1 ). Using this connection, we tighten the Min-cut LP as follows. For each u {00 , 01 , . . . , k0 , k1 } and js V H we define new variables Du (js ) and use these to augment the Min-cut LP with additional 3\r\n\r\n\fconstraints as follows: min\r\nde ,Du (.) eE H\r\n\r\nwe de e = (is , jt ) E H , u {00 , 01 . . . , k0 , k1 } (BMC LP)\r\n\r\nde + Du (is ) - Du (jt ) 0 de + Du (jt ) - Du (is ) 0\r\n\r\nDi0 (i1 ) 1 i {0, . . . , k} Du (js ) 0 js V H , u {00 , 01 . . . k0 , k1 } de 0 e E H Di0 (j0 ) = Di1 (j1 ) i, j {0, . . . , k} Di0 (j1 ) = Di1 (j0 )\r\n\r\nA useful interpretation of the above LP is provided by viewing variables de as the distance between is and jt for any edge e = (is , jt ), and variables Du (js ) as the distance between u and js . The first two constraints ensure that these distance variables satisfy triangle inequality. These, along with the constraint Di0 (i1 ) 1 ensure that for every ST pair (i0 , i1 ), any path P from i0 to i1 has eP de 1. In contrast, the Min-cut LP ensures this kind of separation only for the (00 , 01 ) terminal pair. Later, in Section 5 we will establish a connection between these constraints and cycle constraints [15, 16, 17, 18, 19]. When the LP returns integral solutions, we obtain an optimal MAP labeling using M = {js : D00 (js ) = 0}. When the variables are not integral, [21] suggests a region growing approach for rounding them so as to get a O(log k) approximation of the optimal objective. In practice, we found that ICM starting with fractional node assignments xi = D00 (i0 ) gave better results. 2.3 Reducing the size of ST set\r\n\r\nIn the LP above, for every edge that is non-submodular we add a terminal pair to ST corresponding to any of its two endpoints. The problem of minimizing the size of the ST set is equivalent to the problem of finding the minimum set R of variables of G such that all cycles with an odd number of non-submodular edges are covered. It is easy that see that in any such cycle, it is always possible to flip the variables such that any one selected edge is non-submodular and the rest are submodular. Since finding the optimal R is NP-hard, we used the following heuristics. First, we pick the set of variables to flip so as to minimize the number of non-submodular edges, and then obtain a vertex cover of the reduced non-submodular edges using a greedy algorithm. Interestingly, this problem can be cast as a MAP inference problem on G defined as follows: For each variable, label 0 denotes that the variable is not flipped and 1 denotes that the node is flipped. Thus, if an edge is submodular and both variables attached to it are flipped (i.e labeled 1) then the edge remains submodular. We need to minimize the number of non-submodular edges. Therefore, energy function for this new graphical model will be ij (xi , xj ) = xi xj is non submodular(i, j) (i, j) E i (0) = i (1) = 0 i V When G is planar, for example a grid, the special structure of these potentials (Ising energy function) enables us to get an optimal solution using the matching algorithm of [24, 8]. With the above LP formulation, we were able to obtain exact solutions for most 20x20 grids and 25 node clique graphs. However, the LP does not scale beyond 30x30 grid and 50 node clique graphs. We therefore provide a combinatorial algorithm for solving the LP.\r\n\r\n3\r\n\r\nCombinatorial algorithm\r\n\r\nWe will adapt the primal-dual algorithm that was proposed in [22, 23] for solving the closely related Multi-cut problem. We review this algorithm in Section 3.1 and in Section 3.2 show how we adapt it to solve the BMC LP. 4\r\n\r\n\f3.1\r\n\r\nGarg's algorithm for the Multi-cut problem\r\n\r\nRecall that in the Multi-cut problem, the goal is to remove the minimum weight set of edges so as to separate each (si , ti ) pair in ST. This problem is formulated as the followed primal dual LP pair in [22]. Multi-cut LP: Primal min\r\nd eE H\r\n\r\nMulti-cut LP: Dual max\r\nf P P\r\n\r\nwe de de 1 P P\r\n\r\nfP e EH P P\r\n\r\nfP we\r\nP Pe\r\n\r\neP\r\n\r\nde 0\r\n\r\ne E H\r\n\r\nfP 0\r\n\r\nwhere P denotes all paths between a pair of vertices in ST and Pe denotes the set of paths in P which contain edge e. Garg's algorithm [22, 23] simultaneously solves the primal and dual so that they are within an factor of each other for any user-provided > 0. The algorithm starts by setting all dual variables flow variables to zero and all primal variables de = where is (1 + )/((1 + )L)1/ , and L is the maximum number of edges for any path in P. It then iteratively updates the variables by first finding the shortest path P P which violates the eP de 1 constraint and then, modifying f variables as fP = mineP we i.e f = f +fP and de = de (1+ wP ) e P . At any point a feasible e solution can be obtained by rescaling all the primal and dual variables. Termination is reached when the rescaled primal objective is within (1 + ) of the rescaled dual objective for error parameter . This process is shown to terminate in O(m log1+ 1+ ) steps where m = |E H |. 3.2 Solving the BMC LP\r\n\r\nWe first modify the edge weights on graph H constructed for the BMC LP so that for all edges e = (is , jt ) and its complement e = (is , jt ), the weights are equal, that is, we = we . This can be easily ensured by setting we = we = average of previous edge weights of e and e in H. This change adds all (2n + 2) possible vertices to H i.e all nodes 0 i n contain terminal pairs (i0 , i1 ) in the ST set. For any path P in H we define its complementary path P to be the path obtained by reversing the order of edges and complementing all edges in P . For example, the complement of path (20 , 11 , 30 , 21 ) is (20 , 31 , 10 , 21 ). Next, we consider the following alternative LP called BMC-Sym LP for BMC on symmetric graphs, that is, graphs where we = we min\r\neE H\r\n\r\nwe de de 1 P P\r\neP\r\n\r\n(BMC-Sym LP)\r\n\r\nde 0, de = de e E H Lemma 1 When H is symmetric, the BMC-Sym LP, BMC LP, and Multi-cut LP are equivalent. P ROOF Any feasible solution of BMC-Sym LP can be used to obtain a solution to BMC LP with the same objective as follows: Set de variables unchanged, this keeps the objective intact. Set Du (is ) as the length of the shortest path between u and is that is, Du (is ) = minP paths(u,is ) eP de . This yields a feasible solution -- the constraints de + Du (is ) - Du (jt ) 0 hold because Du (is ) variables are the shortest path between u and is . The constraints Di0 (i1 ) 1 hold because all paths between i0 and i1 have a distance 1 in BMC-Sym LP. The constraints Di0 (j0 ) = Di1 (j1 ) and Di0 (j1 ) = Di1 (j0 ) are satisfied because the distances are symmetric de = de . We next show that any feasible solution of BMC LP gives a feasible solution to Multi-cut LP with the same de and objective value. For any pair (p0 , p1 ) ST the constraint Dp0 (p1 ) 1 along with repeated application of de +Dp0 (is )-Dp0 (jt ) 0 ensures that eP de 1 for any path between p0 and p1 . Finally, we show that if {de } is a feasible solution to Multi-cut LP then it can be used to construct a feasible solution {de } to BMC-Sym LP without changing the value of the objective function using 5\r\n\r\n\fde = de = (de + de )/2. The objective value remains unchanged since we = we . The path constraints eP de 1 hold P P because both path P and its complementary path P are in P and we know that eP de 1 and eP de = eP de 1. We modify Garg's algorithm [22, 23] to exploit the fact that the graph is symmetric so that at each iteration we push twice the flow while keeping the approximation guarantees intact. The key change we make is that when augmenting flow f in some path P , we augment the same flow f to the complementary path P as outlined in our final algorithm in Figure 1. This change ensures that we always obtain symmetric distance values as we prove below. Lemma 2 Suppose H is a symmetric graph then de = de e E H at the end of each iteration of the while loop in algorithm in Figure 1. P ROOF We prove by induction. The claim holds initially, since de = e E H and H is symmetric. Let Pi denote the path selected in the ith iteration of the algorithm. Now, suppose that the hypothesis is true for the nth iteration. In the (n + 1)th iteration, we augment flow f in both paths Pn+1 and P n+1 . These paths Pn+1 and P n+1 do not share any edge because this would imply that there is another pair (j0 , j1 ) of shorter length, and we would choose Pn+1 to be this path instead. f We then do the following update de = de (1 + wP ) with fP = mineP we for both the paths Pn+1 e and P n+1 . Since we = we for all e E and de = de e E H before this iteration, de = de e E H after (n + 1)th step. Theorem 3 The modified algorithm also provides an -approximation algorithm to the BMC LP. P ROOF Suppose, we do not augment the flow in the complementary path P while augmenting P . In the next iteration the original algorithm of [22, 23] picks P or any path with the same path length since the path length of P and P is equal before the iteration and they do not share any common edges. Therefore, by forcing P we are not modifying the course of the original algorithm and the analysis in [22, 23] holds here as well. Input: Graphical model G with reparameterized energy function E, approximation guarantee Create symmetric graph H from G and E Initialize de = ( derived from as shown in Section 3.1), and f = 0, fe = 0, x=arbitrary initial labeling of graphical model G. Define: Primal objective P ({de }) = e we de / minP P eP de Define: Dual objective D(f, {fe }) = f /(maxe fe /we ) while min (E(x) - const , P ({de })) > (1 + )D(f, {fe }) do P = Shortest path between (i0 , i1 ) (i0 , i1 ) ST if ( eP de < 1) then f With fP = min we update f = f + fP , fe = fe + fP , de = de (1 + wP ) e P . e\r\neP\r\n\r\nRepeat above for the complement path P x = current solution after rounding, x =better of x and x end if end while Return bound = D(f, {fe }) + const , MAP = x. Figure 1: Combinatorial Algorithm for MAP inference using BMC. Our algorithm in addition to updating the primal and dual solutions at each iteration, also keeps track of the primal objective obtained with the current best rounding (x in Figure 1). Often, the rounded variables yielded lower primal objective values and led to early termination. The complexity of the algorithm can be shown to be O( -2 km2 ) ignoring the polylog(m) factors. Fleischer [25] subsequently improved the above algorithm by reducing the complexity to O( -2 m2 ). It is interesting to note that running time is independent of k. Though we have presented modification to algorithm in [22, 23], we can fit our algorithm in Fleischer's framework as well. In fact, we use Fleischer's modification for practical implementation of our algorithm. 6\r\n\r\n\fMAP Score/Clique Size\r\n\r\nBound/Clique Size\r\n\r\nBMC MPLP\r\n\r\nTime in secs/Clique Size\r\n\r\n3.8 3.3\r\n2.8 2.3 1.8 1.3 0.8\r\n\r\n3.8\r\n\r\n300 250 200\r\n\r\nBMC MPLP TRW-S\r\n\r\n3.3\r\n2.8 2.3 1.8 1.3 0.8 0 20 40 60\r\n\r\nTRW-S\r\n\r\nBMC MPLP TRW-S\r\n80\r\n\r\n150\r\n100\r\n\r\n50\r\n0 0 20 40 60 80\r\n\r\n0\r\n\r\n20\r\n\r\n40\r\n\r\n60\r\n\r\n80\r\n\r\nClique Size\r\n\r\nClique Size\r\n\r\nClique Size\r\n\r\nFigure 2: Clique size scaled values of MAP, Upper bound, and running time with increasing clique size on three methods: BMC, MPLP, and TRW-S.\r\n500 400\r\n\r\nScore\r\n\r\n200\r\n\r\n200 100\r\n\r\nScore\r\n\r\n300\r\n\r\nScore\r\n\r\nMap_MPLP Bound_MPLP Map_BMC Bound_BMC\r\n\r\n500\r\n\r\n400\r\n300\r\n\r\nMap_MPLP Bound_MPLP Map_BMC Bound_BMC\r\n\r\n500\r\n400\r\n\r\n300 200\r\n100\r\n\r\nMap_MPLP Bound_MPLP Map_BMC Bound_BMC\r\n\r\n100\r\n0 0 50 100 150 200\r\n\r\n0\r\n0 50 100 150 200\r\n\r\n0 0 50 100 150 200\r\n\r\nTime in seconds\r\n\r\n(a) Edge strength = 0.15\r\n\r\nTime in seconds\r\n\r\n(b) Edge strength = 0.5\r\n\r\nTime in seconds\r\n\r\n(c) Edge strength = 2\r\n\r\nFigure 3: Comparing convergence rates of BMC and MPLP for three different clique graphs.\r\n\r\n4\r\n\r\nExperiments\r\n\r\nWe compare our proposed algorithm (called BMC here) with MPLP, a state-of-art message passing algorithm [1] that tightens the standard MAP LP with third order marginal constraints, which are equivalent to cycle constraints for binary MRFs. As reference we also present results for the TRW-S algorithm [9]. BMC is implemented in Java whereas for MPLP we ran the C++ code provided by the authors. We run BMC with = 0.02. MPLP was run with edge clusters until convergence (up to a precision of 2 10-4 ) or for at most 1000 iterations, whichever comes first. Our experiments were performed on two kinds of datasets: (1) Clique graph based binary MRFs of various sizes generated as per the method of [17] where edge potentials are Potts sampled from U [-, ] (our default setting was = 0.5) and node potentials via U [-1, 1], and (2) Maxcut instances of various sizes and densities from the BiqMac library3 . Since the second task is formulated as a maximization problem, for the sake of consistency we report all our results as maximizing the MAP score. We compare the algorithms on the quality of the final solution, the upper bound to MAP score, and running time. It should be noted that multiplicative bounds do not hold here since the reparameterizations give rise to negative constants. In the graphs in Figure 2 we compare BMC, MPLP, and TRW-S with increasing clique size averaged over five seeds. We observe that BMC provides much higher MAP scores and slightly tighter bounds than MPLP. In terms of running time, BMC is more than an order of magnitude faster than MPLP for large graphs. The baseline LP (TRW-S) while much faster than both BMC and MPLP provides really poor MAP scores and bounds. We also compare BMC and MPLP on their speed of convergence. In Figures 3(a), (b), and (c) we show the MAP and Upper bounds for different times in the execution of the algorithm on cliques of size 50 and different edge strengths. BMC, whose bounds and MAP appear as the two short arcs in-between the MAP scores and bounds of MPLP, converges significantly faster and terminates well before MPLP while providing same or better MAP scores and bounds for all edge strengths. In Table 1 we compare the three algorithms on the various graphs from the BiqMac library. The graphs are sorted by increasing density and are all of size 100. We observe that the MAP values for BMC are significantly higher than those for TRW-S. For MPLP, the MAP values are always zero because it decodes marginals purely based on node marginals which for these graphs are tied. The upper bounds achieved by MPLP are significantly tighter than TRW-S, showing that with proper rounding MPLP is likely to produce good MAP scores, but BMC provides even tighter bounds in\r\n3\r\n\r\nhttp://biqmac.uni-klu.ac.at/\r\n\r\n7\r\n\r\n\fGraph pm1s pw01 w01 g05 pw05 w05 pw09 w09 pm1d\r\n\r\ndensity 0.1 0.1 0.1 0.5 0.5 0.5 0.9 0.9 0.99\r\n\r\nBMC 110 1986 653 1409 7975 1444 13427 1995 347\r\n\r\nMAP MPLP 0 0 0 0 0 0 0 0 0\r\n\r\nTRW-S 91 1882 495 1379 7786 1180 13182 1582 277\r\n\r\nBMC 131 2079 720 1650 9131 2245 16493 4073 842\r\n\r\nBound MPLP 200 2397 1115 1720 9195 2488 16404 4095 924\r\n\r\nTRW-S 257 2745 1320 2475 13696 6588 24563 11763 2463\r\n\r\nTime in seconds BMC MPLP TRW-S 45 43 0.005 48 46 0.006 46 41 0.004 761 317 0.021 699 1139 0.021 737 1261 0.021 106 2524 0.041 123 2671 0.053 12 1307 0.047\r\n\r\nTable 1: Comparisons on Maxcut graphs of size 100 from the BiqMac library.\r\n\r\nmost cases. The running time for BMC is significantly lower than MPLP for dense graphs but for sparse graphs (10% edges) it requires the same time as MPLP. Thus, overall we find that BMC achieves tighter bounds and better MAP solutions at a significantly faster rate than the state-of-the-art method for tightening LPs. The gain over MPLP is highest for the case of dense graphs. For sparse graphs many algorithms work, for example recently [8, 26] reported excellent results on planar, or nearly planar graphs and [27] show that even local search works when the graph is sparse.\r\n\r\n5\r\n\r\nDiscussion and Conclusion\r\n\r\nWe put our tightening of the basic MAP LP (Marginal LP in Equation 2 or the Min-cut LP) in perspective with other proposed tightenings based on cycle constraints [17, 18, 1, 19] and higher order marginal constraints [17, 1, 20]. For binary MRFs cycle constraints are equivalent to adding marginal consistency constraints among triples of variables [28]. We show the relationship between cycle constraints and our constraints. Let S = (VS , ES ) denote the minimum cut graph created from G as shown in Section 2.1 but without the i1 vertices for (1 i n) so that weights of non-submodular edges in S will be negative. The LP relaxation of MAP based on cycle constraints is defined as: we de min\r\nd eES\r\n\r\n(1 - de ) +\r\neF eC\\F\r\n\r\nde\r\n\r\n \r\n\r\n1 [0 . . . 1]\r\n\r\nC C, F C and | F | is odd e ES\r\n\r\nde\r\n\r\nwhere C denotes the set of all cycles in S. Suppose we construct our symmetric minimum cut graph H with edges (is , jt ) corresponding to all four possible values of (s, t) for each edge (i, j) E, instead of two that we currently get due to zero-normalized edge potentials. Then, BMC-Sym LP along with the constraints dis jt + dis jt = 1 (is , jt ) EH is equivalent to the cycle LP above. We skip the proof due to lack of space. Our main contribution is that by relaxing the cycle LP to the Bipartite Multi-cut LP we have been able to design a combinatorial algorithm which is guaranteed to provide an approximation to the LP in polynomial time. Since we solve the LP and its dual better than any of the earlier methods of enforcing cycle constraints, we are able to obtain tighter bounds and MAP scores at a considerable faster speed. Future work in this area includes developing combinatorial algorithm for solving the semi-definite program in [21] and extending our approach to multi label graphical models. Acknowledgement We thank Naveen Garg for helpful discussion in relating the multi-commodity flow problem with the Bipartite multi-cut problem. The second author acknowledges the generous support of Microsoft Research and IBM's Faculty award. 8\r\n\r\n\fReferences\r\n[1] David Sontag, Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss. Tightening LP Relaxations for MAP using Message Passing. In UAI, 2008. [2] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. [3] M.I. Schlesinger. Syntactic analysis of two-dimensional visual signals in noisy conditions. Kybernetica, 1976. [4] Chandra Chekuri, Sanjeev Khanna, Joseph (Seffi) Naor, and Leonid Zosin. Approximation Algorithms for the Metric Labeling Problem via a New Linear Programming Formulation. In SODA, 2001. [5] Jon Kleinberg and Eva Tardos. Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields. J. ACM, 49(5):616639, 2002. [6] M. Wainwright, T. Jaakkola, and A. Willsky. MAP Estimation Via Agreement on Trees: Message-Passing and Linear Programming. IEEETIT: IEEE Transactions on Information Theory, 51, 2005. [7] Tom s Werner. A Linear Programming Approach to Max-Sum Problem: A Review. IEEE Trans. Pattern a Anal. Mach. Intell., 29(7):11651179, 2007. [8] Nic Schraudolph. Polynomial-Time Exact Inference in NP-Hard Binary MRFs via Reweighted Perfect Matching. In AISTATS, 2010. [9] Vladimir Kolmogorov. Convergent Tree-Reweighted Message Passing for Energy Minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28(10):15681583, 2006. [10] Talya Meltzer, Amir Globerson, and Yair Weiss. Convergent message passing algorithms - a unifying view. In UAI, 2009. [11] Pradeep Ravikumar, Alekh Agarwal, and Martin J. Wainwright. Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes. JMLR, 11:10431080, 2010. [12] David Sontag and Tommi Jaakkola. Tree Block Coordinate Descent for MAP in Graphical Models. In AI-STATS, volume 9, pages 544551, 2009. [13] Endre Boros and Peter L. Hammer. Pseudo-Boolean Optimization. Discrete Applied Mathematics, 123(13):155225, 2002. [14] Carsten Rother, Vladimir Kolmogorov, Victor S. Lempitsky, and Martin Szummer. Optimizing Binary MRFs via Extended Roof Duality. In CVPR, 2007. [15] Francisco Barahona and Ali Ridha Mahjoub. On the cut polytope. Math. Program., 36(2):157173, 1986. [16] Uri Zwick. Outward Rotations: A Tool for Rounding Solutions of Semidefinite Programming Relaxations, with Applications to MAX CUT and Other Problems. In STOC, 1999. [17] David Sontag and Tommi Jaakkola. New Outer Bounds on the Marginal Polytope. In NIPS, 2007. [18] M. Pawan Kumar, Vladimir Kolmogorov, and Philip H. S. Torr. An Analysis of Convex Relaxations for MAP Estimation of Discrete MRFs. JMLR, 10:71106, 2009. [19] Nikos Komodakis and Nikos Paragios. Beyond Loose LP-Relaxations: Optimizing MRFs by Repairing Cycles. In ECCV, 2008. [20] Tom s Werner. High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft cona straint optimisation (map-mrf). In CVPR, 2008. [21] Sreyash Kenkre and Sundar Vishwanathan. Approximation algorithms for the Bipartite Multicut problem. Information Processing Letters, 110(8-9):282 287, 2010. [22] Naveen Garg, Vijay V. Vazirani, and Mihalis Yannakakis. Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications. SIAM J. Comput., 25(2):235251, 1996. [23] Naveen Garg and Jochen Knemann. Faster and Simpler Algorithms for Multicommodity Flow and Other Fractional Packing Problems. SIAM J. Comput. 37(2): (2007), 37(2):630652, 2007. [24] Amir Globerson and Tommi Jaakkola. Approximate inference using planar graph decomposition. In NIPS, 2006. [25] Lisa Fleischer. Approximating Fractional Multicommodity Flow Independent of the Number of Commodities. SIAM J. Discrete Math., 13(4):505520, 2000. [26] D Batra, A C Gallagher, D Parikh, and T Chen. Beyond trees: Mrf inference via outer-planar decomposition. In CVPR, 2010. [27] Kyomin Jung, Pushmeet Kohli, and Devavrat Shah. Local Rules for Global MAP: When Do They Work? In NIPS. 2009. [28] David Sontag. Cutting plane algorithms for variational inference in graphical models. Master's thesis, MIT, Department of Electrical Engineering and Computer Science, 2007.\r\n\r\n9\r\n\r\n\f", "award": [], "sourceid": 4170, "authors": [{"given_name": "Sashank", "family_name": "J. Reddi", "institution": null}, {"given_name": "Sunita", "family_name": "Sarawagi", "institution": null}, {"given_name": "Sundar", "family_name": "Vishwanathan", "institution": null}]}