{"title": "An Integer Projected Fixed Point Method for Graph Matching and MAP Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 1114, "page_last": 1122, "abstract": "Graph matching and MAP inference are essential problems in computer vision and machine learning. We introduce a novel algorithm that can accommodate both problems and solve them efficiently. Recent graph matching algorithms are based on a general quadratic programming formulation, that takes in consideration both unary and second-order terms reflecting the similarities in local appearance as well as in the pairwise geometric relationships between the matched features. In this case the problem is NP-hard and a lot of effort has been spent in finding efficiently approximate solutions by relaxing the constraints of the original problem. Most algorithms find optimal continuous solutions of the modified problem, ignoring during the optimization the original discrete constraints. The continuous solution is quickly binarized at the end, but very little attention is put into this final discretization step. In this paper we argue that the stage in which a discrete solution is found is crucial for good performance. We propose an efficient algorithm, with climbing and convergence properties, that optimizes in the discrete domain the quadratic score, and it gives excellent results either by itself or by starting from the solution returned by any graph matching algorithm. In practice it outperforms state-or-the art algorithms and it also significantly improves their performance if used in combination. When applied to MAP inference, the algorithm is a parallel extension of Iterated Conditional Modes (ICM) with climbing and convergence properties that make it a compelling alternative to the sequential ICM. In our experiments on MAP inference our algorithm proved its effectiveness by outperforming ICM and Max-Product Belief Propagation.", "full_text": "An Integer Projected Fixed Point Method for Graph\n\nMatching and MAP Inference\n\nMarius Leordeanu\nRobotics Institute\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nleordeanu@gmail.com\n\nMartial Hebert\nRobotics Institute\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nhebert@ri.cmu.edu\n\nRahul Sukthankar\nIntel Labs Pittsburgh\nPittsburgh, PA 15213\n\nrahuls@cs.cmu.edu\n\nAbstract\n\nGraph matching and MAP inference are essential problems in computer vision\nand machine learning. We introduce a novel algorithm that can accommodate\nboth problems and solve them ef\ufb01ciently. Recent graph matching algorithms are\nbased on a general quadratic programming formulation, which takes in consid-\neration both unary and second-order terms re\ufb02ecting the similarities in local ap-\npearance as well as in the pairwise geometric relationships between the matched\nfeatures. This problem is NP-hard, therefore most algorithms \ufb01nd approximate\nsolutions by relaxing the original problem. They \ufb01nd the optimal continuous so-\nlution of the modi\ufb01ed problem, ignoring during optimization the original discrete\nconstraints. Then the continuous solution is quickly binarized at the end, but very\nlittle attention is put into this \ufb01nal discretization step. In this paper we argue that\nthe stage in which a discrete solution is found is crucial for good performance.\nWe propose an ef\ufb01cient algorithm, with climbing and convergence properties, that\noptimizes in the discrete domain the quadratic score, and it gives excellent results\neither by itself or by starting from the solution returned by any graph matching\nalgorithm. In practice it outperforms state-or-the art graph matching algorithms\nand it also signi\ufb01cantly improves their performance if used in combination. When\napplied to MAP inference, the algorithm is a parallel extension of Iterated Con-\nditional Modes (ICM) with climbing and convergence properties that make it a\ncompelling alternative to the sequential ICM. In our experiments on MAP infer-\nence our algorithm proved its effectiveness by signi\ufb01cantly outperforming [13],\nICM and Max-Product Belief Propagation.\n\n1\n\nIntroduction\n\nGraph matching and MAP inference are essential problems in computer vision and machine learning\nthat are frequently formulated as integer quadratic programs, where obtaining an exact solution is\ncomputationally intractable. We present a novel algorithm, Integer Projected Fixed Point (IPFP), that\nef\ufb01ciently \ufb01nds approximate solutions to such problems. In this paper we focus on graph match-\ning, because it is in this area that we have extensively compared our algorithm to state-of-the-art\nmethods. Feature matching using pairwise constraints is gaining a widespread use in computer vi-\nsion, especially in shape and object matching and recognition. It is a generalization of the classical\ngraph matching problem, formulated as an integer quadratic program [1,3,4,5,7,8,16,17] that takes\ninto consideration both unary and second-order terms re\ufb02ecting the similarities in local appearance\n\n\fas well as in the pairwise geometric relationships between the matched features. The problem is\nNP-hard, and a lot of effort has been spent in \ufb01nding good approximate solutions by relaxing the\ninteger one-to-one constraints, such that the continuous global optimum of the new problem can\nbe found ef\ufb01ciently. In the end, little computational time is spent in order to binarize the solution,\nbased on the assumption that the continuous optimum is close to the discrete global optimum of\nthe original combinatorial problem. In this paper we show experimentally that this is not the case\nand that, in fact, carefully searching for a discrete solution is essential for maximizing the quadratic\nscore. Therefore we propose an iterative algorithm that takes as input any continuous or discrete\nsolution, possibly given by some other graph matching method, and quickly improves it by aiming\nto maximize the original problem with its integer constraints. Each iteration consists of two stages,\nbeing loosely related to the Frank-Wolfe method (FW) [14, 15], a classical optimization algorithm\nfrom operation research. The \ufb01rst stage maximizes in the discrete domain a linear approximation of\nthe quadratic function around the current solution, which gives a direction along which the second\nstage maximizes the original quadratic score in the continuous domain. Even though this second\nstage might \ufb01nd a non-discrete solution, the optimization direction given by the \ufb01rst stage is always\ntowards an integer solution, which is often the same one found in the second stage. The algorithm\nalways improves the quadratic score in the continuous domain \ufb01nally converging to a maximum. If\nthe quadratic function is convex the solution at every iteration is always discrete and the algorithm\nconverges in a \ufb01nite number of steps. In the case of non-convex quadratic functions, the method\ntends to pass through/near discrete solutions and the best discrete solution encountered along the\npath is returned, which, in practice is either identical or very close to the point of convergence. We\nhave performed extensive experiments with our algorithm with excellent results, the most repre-\nsentative of which being shown in this paper. Our method clearly outperforms four state-of-the-art\nalgorithms, and, when used in combination, the \ufb01nal solution is dramatically improved. Some re-\ncent MAP inference algorithms [11,12,13] for Markov Random Fields formulate the problem as\nan integer quadratic program, for which our algorithm is also well suited, as we later explain and\ndemonstrate in more detail.\n\nMatching Using Pairwise Constraints The graph matching problem, in its most recent and gen-\neral form, consists of \ufb01nding the indicator vector x\u2217 that maximizes a certain quadratic score func-\ntion:\nProblem 1:\n\nx\u2217 = argmax(xTMx) s. t. Ax = 1, x \u2208 {0, 1}n\n\n(1)\ngiven the one-to-one constraints Ax = 1, x \u2208 {0, 1}n, which require that x is an indicator vector\nsuch that xia = 1 if feature i from one image is matched to feature a from the other image and\nzero otherwise. Usually one-to-one constraints are imposed on x such that one feature from one\nimage can be matched to at most one other feature from the other image. In MAP inference prob-\nlems, only many-to-one constraints are usually required, which can be accommodated by the same\nformulation, by appropriately setting the constraints matrix A. In graph matching, M is usually\na symmetric matrix with positive elements containing the compatibility score functions, such that\nMia;jb measures how similar the pair of features (i, j) from one image is in both local appearance\nand pair-wise geometry with the pair of their candidate matches (a, b) from the other image. The dif-\n\ufb01culty of Problem 1 depends on the structure of this matrix M, but in the general case it is NP-hard\nand no ef\ufb01cient algorithm exists that can guarantee optimality bounds. Previous algorithms modify\nProblem 1, usually by relaxing the constraints on the solution, in order to be able to \ufb01nd ef\ufb01ciently\noptimal solutions to the new problem. For example, spectral matching [5] (SM) drops the constraints\nentirely and assumes that the leading eigenvector of M is close to the optimal discrete solution. It\nthen \ufb01nds the discrete solution x by maximizing the dot-product with the leading eigenvector of\nM. The assumption is that M is a slightly perturbed version of an ideal matrix, with rank-1, for\nwhich maximizing this dot product gives the global optimum. Later, spectral graph matching with\naf\ufb01ne constraints was developed [3] (SMAC), which \ufb01nds the optimal solution of a modi\ufb01ed score\nfunction, with a tighter relaxation that imposes the af\ufb01ne constraints Ax = 1 during optimization.\nA different, probabilistic interpretation, not based on the quadratic formulation, is given in [2] (PM),\nalso based on the assumption that M is close to a rank-1 matrix, which is the outer product of the\nvector of probabilities for each candidate assignment. An important observation is that none of the\nprevious methods are concerned with the original integer constraints during optimization, and the\n\ufb01nal post processing step, when the continuous solution is binarized, is usually just a very simple\nprocedure. They assume that the continuous solution is close to the discrete one. The algorithm\n\n\fwe propose here optimizes the original quadratic score in the continuous domain obtained by only\ndropping the binary constraints, but it always targets discrete solutions through which it passes most\nof the time. Note that even in this continuous domain the quadratic optimization problem is NP-\nhard, so we cannot hope to get any global optimality guarantees. But we do not lose much, since\nguaranteed global optimality for a relaxed problem does not require closeness to the global optimum\nof the original problem, a fact that is evident in most of our experiments. Our experimental results\nfrom Section 4 strongly suggest an important point: algorithms with global optimality properties in\na loosely relaxed domain can often give relatively poor results in the original domain, and a well-\ndesigned procedure with local optimality properties in the original domain, such as IPFP, can have\na greater impact on the \ufb01nal solution than the global optimality in the relaxed domain.\nOur algorithm aims to optimize the following continuous problem, in which we only drop the integer\nconstraints from Problem 1:\nProblem 2:\n\n(2)\nNote that Problem 2 is also NP-hard, and it becomes a concave minimization problem, equivalent to\nProblem 1, when M is positive de\ufb01nite.\n\nx\u2217 = argmax(xTMx) s. t. Ax = 1, x \u2265 0\n\n2 Algorithm\n\nWe introduce our novel algorithm, Integer Projected Fixed Point (IPFP), that takes as input any initial\nsolution, continuous or discrete, and quickly \ufb01nds a solution obeying the initial discrete constraints\nof Problem 1 with a better score, most often signi\ufb01cantly better than the initial one (Pd from Step 2\nis a projection on the discrete domain, discussed shortly afterwards):\n\n1. Initialize x\u2217 = x0, S\u2217 = xT\n2. Let bk+1 = Pd(Mxk), C = xT\n3. If D \u2265 0 set xk+1 = bk+1.\n\nxk+1 = xk + r(bk+1 \u2212 xk)\n\nk+1Mbk+1 \u2265 S\u2217 then set S\u2217 = bT\n\n4. If bT\n5. If xk+1 = xk stop and return the solution x\u2217\n6. Set k = k + 1 and go back to Step 2.\n\n0 Mx0, k = 0, where xi \u2265 0 and x (cid:54)= 0\n\nk M(bk+1 \u2212 xk), D = (bk+1 \u2212 xk)TM(bk+1 \u2212 xk)\nElse let r = min {\u2212C/D, 1} and set\n\nk+1Mbk+1 and x\u2217 = bk+1\n\nThis algorithm is loosely related to the power method for eigenvectors, also used by spectral match-\ning [9]: at Step 2 it replaces the \ufb01xed point iteration of the power method vk+1 = P (Mvk), where\nP is the projection on the unit sphere, with a similar iteration bk+1 = Pd(Mxk), in which Pd is\nthe projection on the one-to-one (for graph matching) or many-to-one (for MAP inference) discrete\nconstraints. Pd boils down to \ufb01nding the discrete vector bk+1 = argmax bTMxk, which can be\neasily found in linear time for many-to-one constraints. For one-to-one constraints the ef\ufb01cient Hun-\ngarian method can be used. This is true since all binary vectors in the given discrete domain have\nthe same norm. Note that (see Proposition 1), in both cases (one-to-one or many-to-one constraints),\nthe discrete bk+1 is also the one maximizing the dot-product with Mxk in the continuous domain\nAb = 1, b > 0. IPFP is also related to Iterative Conditional Modes (ICM) [10] used for inference\nin graphical models. In the domain of many-to-one constraints IPFP becomes an extension of ICM\nfor which the updates are performed in parallel without losing the climbing property and the con-\nvergence to a discrete solution. Note that the fully parallel version of ICM is IPFP without Step 3:\nxk+1 = Pd(Mxk). The theoretical results that we will present shortly are valid for both one-to-one\nand many-to-one constraints, with a few differences that we will point out when deemed necessary.\nThe algorithm is a basically a sequence of linear assignment (or independent labeling) problems,\nin which the next solution is found by using the previous one. In practice the algorithm converges\nin about 5 \u2212 10 steps, which makes it very ef\ufb01cient, with basically the same complexity as the\ncomplexity of Step 2. Step 3 insures that the quadratic score increases with each iteration. Step 4\nguarantees that the binary solution returned is never worse than the initial solution. In practice, the\nalgorithm signi\ufb01cantly improves the initial binary solution, and the \ufb01nal continuous solution is most\noften discrete, and always close to the best discrete one found. In fact, in the case of MAP inference,\nit is guaranteed that the point of convergence is discrete, as a \ufb01xed point of Pd.\n\n\fk Mxk+2xT\n\nIntuition The intuition behind this algorithm is the following: at every iteration the quadratic\nscore xTMx is \ufb01rst approximated by the \ufb01rst order Taylor expansion around the current solution xk:\nxTMx \u2248 xT\nk M(x \u2212 xk). This approximation is maximized within the discrete domain\nof Problem 1, at Step 2, where bk+1 is found. From Proposition 1 (see next) we know that the same\ndiscrete bk+1 also maximizes the linear approximation in the continuous domain of Problem 2.\nThe role of bk+1 is to provide a direction of largest possible increase (or ascent) in the \ufb01rst-order\napproximation, within both the continuous domain and the discrete domain simultaneously. Along\nthis direction the original quadratic score can be further maximized in the continuous domain of\nProblem 2 (as long as bk+1 (cid:54)= xk). At Step 3 we \ufb01nd the optimal point along this direction,\nalso inside the continuous domain of Problem 2. The hope, also con\ufb01rmed in practice, is that\nthe algorithm will tend to converge towards discrete solutions that are, or are close to, maxima of\nProblem 2.\n\n3 Theoretical Analysis\n\nk Mxk increases at every step k and the sequence of xk converges.\n\nProposition 1: For any vector x \u2208 Rn there exists a global optimum y\u2217 of xTMy in the domain of\nProblem 2 that has binary elements (thus it is also in the domain of Problem 1).\nProof: Maximizing xTMy with respect to y, subject to Ay = 1 and y > 0 is a linear program for\nwhich an integer optimal solution exists because the constraints matrix A is totally unimodular [9].\nThis is true for both one-to-one and many-to-one constraints.\nIt follows that the maximization from Step 2 bk+1 = argmax bTMxk in the original discrete\ndomain, also maximizes the same dot-product in the continuous domain of Problem 2, of relaxed\nconstraints Ax = 1 and x > 0. This ensures that the algorithm will always move towards some\ndiscrete solution that also maximizes the linear approximation of the quadratic function in the do-\nmain of Problem 2. Most often in practice, that discrete solution also maximizes the quadratic score,\nalong the same direction and within the continuous domain. Therefore xk is likely to be discrete at\nevery step.\nProperty 1:\nThe quadratic score xT\nProof:\nFor a given step k, if bk+1 = xk we have convergence. If bk+1 (cid:54)= xk, let x be a point on the line be-\ntween xk and bk+1, x = xk +t(bk+1\u2212xk). For any 0 \u2264 t \u2264 1, x is in the feasible domain of Prob-\nlem 2. Let Sk = xT\nk Mxk. Let us de\ufb01ne the quadratic function f (t) = xTMx = Sk + 2tC + t2D,\nwhich is the original function in the domain of Problem 2 on the line between xk and bk+1. Since\nk M in the discrete (and the continuous) domain, it follows\nbk+1 maximizes the dot product with xT\nthat C \u2265 0. We have two cases: D \u2265 0, when xk+1 = bk+1 (Step 3) and Sk+1 = xT\nk+1Mxk+1 =\nfq(1) \u2265 Sk = xT\nk Mxk; and D < 0, when the quadratic function fq(t) is convex with the maximum\nin the domain of Problem 2 attained at point xk+1 = xk + r(bk+1 \u2212 xk). Again, it also follows\nk+1Mxk+1 = fq(r) \u2265 Sk = xT\nthat Sk+1 = xT\nk Mxk. Therefore, the algorithm is guaranteed to\nincrease the score at every step. Since the score function is bounded above on the feasible domain,\nit has to converge, which happens when C = 0.\nBy always improving the quadratic score in the continuous domain, at each step the next solution\nmoves towards discrete solutions that are better suited for solving the original Problem 1.\nProperty 2: The algorithm converges to a maximum of Problem 2.\nProof:\nLet x\u2217 be a point of convergence. At that point the gradient 2Mx\u2217 is non-zero since both M and\nx\u2217 have positive elements and (x\u2217)TMx\u2217 > 0, (it is higher than the score at the \ufb01rst iteration, also\ngreater than zero). Since x\u2217 is a point of convergence it follows that C = 0, that is, for any other x in\nthe continuous domain of Problem 2, (x\u2217)TMx\u2217 \u2265 (x\u2217)TMx. This implies that for any direction\nvector v such that x\u2217 + tv is in the domain of Problem 2 for a small enough t > 0, the dot-product\nbetween v and the gradient of the quadratic score is less than or equal to zero (x\u2217)TMv \u2264 0, which\nfurther implies that x\u2217 is a maximum (local or global) of the quadratic score within the continuous\ndomain of equality constraints Ax\u2217 = 1, x\u2217 > 0.\n\n\fFor many-to-one constraints (MAP inference) it basically follows that the algorithm will converge\nto a discrete solution, since the strict (local and global) maxima of Problem 2 are in the discrete\ndomain [12]. If the maximum is not strict, IPFP still converges to a discrete solution (which is also a\nlocal maximum): the one found at Step 2. This is another similarity with ICM, which also converges\nto a maximum. Therefore, combining ours with ICM cannot improve the performance of ICM, and\nvice-versa.\nProperty 3: If M is positive semide\ufb01nite with positive elements, then the algorithm converges in a\n\ufb01nite number of iterations to a discrete solution, which is a maximum of Problem 2.\nProof: Since M is positive semide\ufb01nite we always have D \u2265 0, thus xk is always discrete for any\nk. Since the number of discrete solutions is \ufb01nite, the algorithm must converge in a \ufb01nite number of\nsteps to a local (or global) maximum, which must be discrete. This result is obviously true for both\none-to-one and many-to-one constraints.\nWhen M is positive semide\ufb01nite, Problem 2 is a concave minimization problem for which it is well\nknown that the global optimum has integer elements, so it is also a global optimum of the original\nProblem 1. In this case our algorithm is only guaranteed to \ufb01nd a local optimum in a \ufb01nite number of\niterations. Global optimality of concave minimization problems is a notoriously dif\ufb01cult task since\nthe problem can have an exponential number of local optima. In fact, if a large enough constant\nis added to the diagonal elements of M, every point in the original domain of possible solutions\nbecomes a local optimum for one-to-one problems. Therefore adding a large constant to make the\nproblem concave is not good idea , even if the global optimum does not change. In practice M is\nrarely positive semide\ufb01nite, but it can be close to being one if the \ufb01rst eigenvalue is much larger than\nthe rest, which is the assumption made by the spectral matching algorithm, for example.\nProperty 4: If M has non-negative elements and is rank-1, then the algorithm will converge and\nreturn the global optimum of the original problem after the \ufb01rst iteration.\nProof:\nLet v, \u03bb be the leading eigenpair of M. Then, since M has non-negative elements both v and \u03bb are\npositive. Since M is also rank one, we have Mx0 = \u03bb(vT x0)v. Since both x0 and v have positive\nelements it immediately follows that x1 after the \ufb01rst iteration is the indicator solution vector that\nmaximizes the dot-product with the leading eigenvector (vT x0 = 0 is a very unlikely case that never\nhappens in practice). It is clear that this vector is the global optimum, since in the rank-1 case we\nhave: xT Mx = \u03bb1(vT x)2, for any x.\nThe assumption that M is close to being rank-1 is used by two recent algorithms, [2] and [5].\nSpectral matching [5] also returns the optimal solution in this case and it assumes that the rank-1\nassumption is the ideal matrix to which a small amount of noise is added. Probabilistic graph match-\ning [2] makes the rank-1 approximation by assuming that each second-order element of Mia;jb is\nthe product of the probability of feature i being matched to a and feature j being matched to b, inde-\npendently. However, instead of maximizing the quadratic score function, they use this probabilistic\ninterpretation of the pair-wise terms and \ufb01nd the solution by looking for the closest rank-1 matrix\nto M in terms of the KL-divergence. If the assumptions in [2] were perfectly met, then spectral\nmatching, probabilistic graph matching and our algorithm would all return the same solution. For a\ncomparison of all these algorithms on real world experiments please see the experiments section.\n\n4 Experiments\n\nWe \ufb01rst present some representative experiments on graph matching problems. We tested IPFP by\nitself, as well as in conjunction with other algorithms as a post-processing step. When used by itself\nIPFP is always initialized with a \ufb02at, uniform continuous solution. We followed the experiments\nof [6] in the case of outliers: we used the same cars and motorbikes image pairs, extracted from\nthe Pascal 2007 database, the same features (oriented points extracted from contours) and the same\nsecond-order potentials Mia;jb = e\u2212wT gia;jb; gia;jb is a \ufb01ve dimensional vector of deformations in\npairwise distances and angles when matching features (i, j) from one image to features (a, b) from\nthe other and w is the set of parameters that control the weighting of the elements of gia;jb. We\nfollowed the setup from [6] exactly, in order to have a fair comparison of our algorithm against the\nresults they obtained. Due to space limitations, we refer the interested reader to [6] for the details.\nThese experiments are dif\ufb01cult due the large number of outliers (on average 5 times more outliers\nthan inliers), and, in the case of cars and motorbikes, also due to the large intra-category variations\n\n\fFigure 1: Results on motorbikes and cars averaged over 30 experiments: at each iteration the average\nscore xT\nk Mxk normalized by the ground truth score is displayed. The comparisons are not affected\nby this normalization, since all scores are normalized by the same value. Notice how quickly IPFP\nconverges (fewer than 10 iterations)\n\nTable 1: Average matching rates for the experiments with outliers on cars and motorbikes from Pas-\ncal 07. Note that our algorithm by itself outperforms on average all the others by themselves. When\nthe solution of other algorithms is the starting point of IPFP the performance is greatly improved.\n\nDataset\nCars and Motorbikes: alone\nCars and Motorbikes: + IPFP\nCars and Motorbikes: Improvement\n\nIPFP\n\nSM SMAC\n64.4% 58.2% 58.6%\n64.4% 67.0% 66.2%\n\nPM\nGA\n46.7%\n36.6%\n66.3% 67.2%\nNA +8.8% +7.6% +19.6% +30.6%\n\nin shape present in the Pascal 2007 database. By outliers we mean the features that have no ground\ntruth correspondences in the other image, and by inliers those that have such correspondences. As in\n[6] we allow outliers only in one of the images in which they are present in large number, the ratio\nof outliers to inliers varying from 1.5 to over 10. The ground truth correspondences were manually\nselected by the authors of [6].\nThe dif\ufb01culty of the matching problems is re\ufb02ected by the relatively low matching scores of all\nalgorithms (Table 1).\nIn order to ensure an optimal performance of all algorithms, we used the\nsupervised version of the graph matching learning method from [6]. Learning w was effective,\nimproving the performance by more than 15% on average, for all algorithms. The algorithms we\nchose for comparison and also for combining with ours are among the current state-of-the-art in\nthe literature: spectral matching with af\ufb01ne constraints (SMAC) [3], spectral matching (SM) [5],\nprobabilistic graph matching (PM) [2], and graduated assignment (GA) [4]. In Tables 1 and 2 we\nshow that in our experiments IPFP signi\ufb01cantly outperforms other state-of-the-art algorithms.\nIn our experiments we focused on two aspects. Firstly, we tested the matching rate of our algorithm\nagainst the others, and observed that it consistently outperforms them, both in the matching rate and\nin the \ufb01nal quadratic score achieved by the resulting discrete solution (see Tables 1, 2). Secondly,\nwe combined our algorithm, as a post-processing step, with the others and obtained a signi\ufb01cant im-\nprovement over the output matching rate and quadratic score of the other algorithms by themselves\n(see Figures 1, 2). In Figure 2 we show the quadratic score of our algorithm, per iteration, for sev-\neral individual experiments, when it takes as initial solution the output of several other algorithms.\nThe score at the \ufb01rst iteration is the score of the \ufb01nal discrete solution returned by those algorithms\nand the improvement in just a few iterations is substantial, sometimes more than doubling the \ufb01nal\nquadratic score reached by the other algorithms. In Figure 1 we show the average scores of our\nalgorithm, over 30 different experiments on cars and motorbikes, per iteration, normalized by the\nscore of the solutions given by the human ground truth labeling. We notice that regardless of the\nstarting condition, the \ufb01nal scores are very similar, slightly above the value of 1 (Table 2), which\nmeans that the solutions reached are, on average, at least as good, in terms of the matching score\nfunction, as the manually picked solutions. None of the algorithms by themselves, except only for\nIPFP, reach this level of quality. We also notice that a quadratic score of 1 does not correspond\nto a perfect matching rate, which indicates the fact that besides the ground truth solution there are\n\n\fTable 2: Quadratic scores on the Cars and Motorbikes image sets (the higher, the better). S\u2217 is\nthe score of the manually picked ground truth. Note that the ground truth score S\u2217 does not affect\nthe comparison since it is the same normalization value for all algorithms. The \u201cConvergence to a\nbinary solution\u201d row shows the average rate at which our algorithm converges to a discrete solution.\n\nExperiments on Cars and Motorbikes\nAlone, avg Smax/S\u2217\n+ IPFP, avg Smax/S\u2217\nConvergence to a binary solution\n\nPM\nIPFP\n1.081\n0.4785\n1.081\n1.080\n86.7% 93.3% 86.7% 93.3% 86.7%\n\nSM SMAC\n0.927\n1.082\n\n0.781\n1.070\n\nGA\n0.623\n1.086\n\nother solutions with high score. This is expected, given that the large number of outliers can easily\nintroduce wrong solutions of high score. However, increasing the quadratic score, does increase the\nmatching rate as can be seen by comparing the results between the Tables 2 and 1.\n\nFigure 2: Experiments on cars and motorbikes: at each iteration the score xT\nk Mxk normalized by\nthe ground truth score is displayed for 30 individual matching experiments for our algorithm starting\nfrom different solutions (uniform, or given by some other algorithm).\n\nExperiments on MAP inference problems We believe that IPFP can have a greater impact in\ngraph matching problems than in MAP inference ones, due to the lack of ef\ufb01cient, high-quality\ndiscretization procedures in the graph matching literature.\nIn the domain of MAP inference for\nMRFs, it is important to note that IPFP is strongly related to the parallel version of Iterated Con-\nditional Modes, but, unlike parallel ICM, it has climbing, strong convergence and local optimality\nproperties. To see the applicability of our method to MAP inference, we tested it against sequen-\ntial ICM, Max-Product BP with damping oscillations (Table 3), the algorithm L2QP of [12], and\nthe the algorithm of [13], which is based on a convex approximation. In the case of [12] and [13],\nwhich give continuous optimal solutions to a relaxed problem, a post-processing step is required\nfor discretization. Note that the authors of [13] use ICM to obtain a binary solution. However, we\nwanted to emphasize the quality of the methods by themselves, without a powerful discretization\nstep, and used ICM for comparisons separately. Thus, for discretization we used one iteration of\nICM for both our L2QP [12] and CQP [13]. Both ICM and IPFP used as initial condition a uniform\n\ufb02at solution as in the case of graph matching. We used the same experimental setup as in [11] and\n[12], on graphs with different degrees of edge density (by generating random edges with a given\nprobability, varying from 0.1 to 1). The values of the potentials were randomly generated as in [11]\nand [12], favoring the correct labels vs. the wrong ones. In Figure 3 we show the average scores\nnormalized by the score of IPFP over 30 different experiments, for different probabilities of edge\ngeneration pEdge on graphs with 50 nodes and different number of possible labels per node. The\nmost important observation is that both ICM and IPFP outperform L2QP and CQP by a wide margin\non all problems without any single exception. In our experiments, on every single problem, IPFP\n\n\fTable 3: Average objective score over 30 different experiments on 4-connected and 8-connected\nplanar graphs with 50 sites and 10 possible labels per site\n\nGraph type\n4-connected planar\n8-connected planar\n\nIPFP\n79.5\n126.0\n\nICM BP\n78.2\n54.2\n75.4\n123.2\n\noutperformed ICM, while both IPFP and ICM outperformed both L2QP and CQP by a wide margin,\nwhich is re\ufb02ected in the averages shown in Figure 3.\n\nFigure 3: Average quadratic scores normalized by the score of IPFP, over 30 different experiments,\nfor each probability of edge generation pEdge \u2208 0.1, 0.2, ..., 1 and different number of labels, for\ngraphs with 50 nodes. Note that IPFP consistently ourperforms L2QP [12] and CQP [13] (by a wide\nmargin) and ICM. Note that L2QP and CQP perform similarly for a small number of labels.\n\n5 Conclusion\n\nThis paper presents a novel and computationally ef\ufb01cient algorithm, Integer Projected Fixed Point\n(IPFP), that outperforms state-of-the-art methods for solving quadratic assignment problems in\ngraph matching, and well-established methods in MAP inference such as BP and ICM. We analyze\nthe theoretical properties of IPFP and show that it has strong convergence and climbing guarantees.\nAlso, IPFP can be employed in conjunction with existing techniques, such as SMAC or SM for graph\nmatching or BP for inference to achieve solutions that are dramatically better than the ones produced\nindependently by those methods alone. Furthermore, IPFP is very straightforward to implement and\nconverges in only 5\u201310 iterations in practice. Thus, IPFP is very well suited for addressing a broad\nrange of real-world problems in computer vision and machine learning.\n\n6 Acknowledgments\n\nThis work was supported in part by NSF Grant IIS0713406 and by the Intel Graduate Fellowship\nprogram.\n\n\fReferences\n\n[1] A. Berg, T. Berg and J. Malik. Shape matching and object recognition using low distortion correspondences.\nComputer Vision and Pattern Recognition, 2005\n[2] R. Zass and A. Shashua. Probabilistic Graph and Hypergraph Matching. Computer Vision and Pattern\nRecognition, 2008\n[3] T. Cour, P. Srinivasan and J. Shi. Balanced Graph Matching. Neural Information Processing Systems, 2006\n[4] S. Gold, and A. Rangarajan. A graduated assignment algorithm for graph matching. Pattern Analysis and\nMachine Intelligence, 1996\n[5] M. Leordeanu and M. Hebert. A Spectral Technique for Correspondence Problems using Pairwise Con-\nstraints. International Conference on Computer Vision, 2005\n[6] M. Leordeanu and M. Hebert. Unsupervised Learning for Graph Matching. Computer Vision and Pattern\nRecognition, 2009\n[7] C. Schellewald and C. Schnorr. Probabilistic subgraph matching based on convex relaxation. EMMCVPR,\n2005\n[8] P.H.S Torr. Solving markov random \ufb01elds using semi de\ufb01nite programming. Arti\ufb01cial Intelligence and\nStatistics, 2003\n[9] B. Rainer, M. Dell\u2019Amico and S. Martello. Assignment Problems. SIAM Publications, 2009\n[10] J. Besag. On the Statistical Analysis of Dirty Pictures. JRSS, 1986\n[11] T. Cour and J. Shi. Solving Markov Random Fields with Spectral Relaxation. International Conference\non Arti\ufb01cial Intelligence and Statistics, 2007\n[12] M. Leordeanu and M. Hebert. Ef\ufb01cient MAP approximation for dense energy functions. International\nConference on Machine Learning, 2006\n[13] P. Ravikumar and J. Lafferty. Quadratic Programming Relaxations for Metric Labeling and Markov Ran-\ndom Field MAP Estimation, International Conference on Machine Learning, 2006\n[14] M. Frank and P. Wolfe. An algorithm for quadratic programming, Naval Research Logistics Quarterly,\n1956.\n[15] N.W. Brixius and K.M. Anstreicher. Solving quadratic assignment problems using convex quadratic pro-\ngramming relaxations, Optimization Methods and Software, 2001\n[16] J. Maciel and J.P. Costeira. A global solution to sparse correspondence problems Pattern Analysis and\nMachine Intelligence, 2003\n[17] L. Torresani, V. Kolmogorov and C. Rother. Feature correspondence via graph matching: Models and\nglobal optimization. European Conference on Computer Vision, 2008\n\n\f", "award": [], "sourceid": 1069, "authors": [{"given_name": "Marius", "family_name": "Leordeanu", "institution": null}, {"given_name": "Martial", "family_name": "Hebert", "institution": null}, {"given_name": "Rahul", "family_name": "Sukthankar", "institution": null}]}