{"title": "A Graph Theoretic Additive Approximation of Optimal Transport", "book": "Advances in Neural Information Processing Systems", "page_first": 13836, "page_last": 13846, "abstract": "Transportation cost is an attractive similarity measure between probability distributions due to its many useful theoretical properties. However, solving optimal transport exactly can be prohibitively expensive. Therefore, there has been significant effort towards the design of scalable approximation algorithms. Previous combinatorial results [Sharathkumar, Agarwal STOC '12, Agarwal, Sharathkumar STOC '14] have focused primarily on the design of near-linear time multiplicative approximation algorithms. There has also been an effort to design approximate solutions with additive errors [Cuturi NIPS '13, Altschuler \\etal\\ NIPS '17, Dvurechensky \\etal\\, ICML '18, Quanrud, SOSA '19] within a time bound that is linear in the size of the cost matrix and polynomial in $C/\\delta$; here $C$ is the largest value in the cost matrix and $\\delta$ is the additive error. \nWe present an adaptation of the classical graph algorithm of Gabow and Tarjan and provide a novel analysis of this algorithm that bounds its execution time by $\\BigO(\\frac{n^2 C}{\\delta}+ \\frac{nC^2}{\\delta^2})$. Our algorithm is extremely simple and executes, for an arbitrarily small constant $\\eps$, only $\\lfloor \\frac{2C}{(1-\\eps)\\delta}\\rfloor + 1$ iterations, where each iteration consists only of a Dijkstra-type search followed by a depth-first search. We also provide empirical results that suggest our algorithm is competitive with respect to a sequential implementation of the Sinkhorn algorithm in execution time. Moreover, our algorithm quickly computes a solution for very small values of $\\delta$ whereas Sinkhorn algorithm slows down due to numerical instability.", "full_text": "A Graph Theoretic Additive Approximation of\n\nOptimal Transport\n\nNathaniel Lahn\n\nDepartment of Computer Science\n\nVirginia Tech\n\nBlacksburg, VA 24061\n\nlahnn@vt.edu\n\nDeepika Mulchandani\n\nVirginia Tech\n\nBlacksburg, VA 24061\ndeepikak@vt.edu\n\nSharath Raghvendra\n\nVirginia Tech\n\nBlacksburg, VA 24061\nsharathr@vt.edu\n\nAbstract\n\nTransportation cost is an attractive similarity measure between probability distribu-\ntions due to its many useful theoretical properties. However, solving optimal trans-\nport exactly can be prohibitively expensive. Therefore, there has been signi\ufb01cant\neffort towards the design of scalable approximation algorithms. Previous combina-\ntorial results [Sharathkumar, Agarwal STOC \u201912, Agarwal, Sharathkumar STOC\n\u201914] have focused primarily on the design of near-linear time multiplicative approx-\nimation algorithms. There has also been an effort to design approximate solutions\nwith additive errors [Cuturi NIPS \u201913, Altschuler et al. NIPS \u201917, Dvurechensky et\nal. ICML \u201918, Quanrud, SOSA \u201919] within a time bound that is linear in the size of\nthe cost matrix and polynomial in C/\u03b4; here C is the largest value in the cost matrix\nand \u03b4 is the additive error. We present an adaptation of the classical graph algorithm\nof Gabow and Tarjan and provide a novel analysis of this algorithm that bounds\nits execution time by O( n2C\n\u03b42 ). Our algorithm is extremely simple and\nexecutes, for an arbitrarily small constant \u03b5, only (cid:98) 2C\n(1\u2212\u03b5)\u03b4(cid:99) + 1 iterations, where\neach iteration consists only of a Dijkstra-type search followed by a depth-\ufb01rst\nsearch. We also provide empirical results that suggest our algorithm is competitive\nwith respect to a sequential implementation of the Sinkhorn algorithm in execution\ntime. Moreover, our algorithm quickly computes a solution for very small values\nof \u03b4 whereas Sinkhorn algorithm slows down due to numerical instability.\n\n\u03b4 + nC2\n\n1\n\nIntroduction\n\nTransportation cost has been successfully used as a measure of similarity between data sets such as\npoint clouds, probability distributions, and images. Originally studied in operations research, the\ntransportation problem is a fundamental problem where we are given a set A of \u2018demand\u2019 nodes\nand a set B of \u2018supply\u2019 nodes with a non-negative demand of da at node a \u2208 A and a non-negative\nsupply sb at node b \u2208 B. Let G(A, B) be a complete bipartite graph on A, B with n = |A| + |B|\nwhere c(a, b) \u2265 0 denotes the cost of transporting one unit of supply from b to a; we assume that\nC is the largest cost of any edge in the graph. We assume that the cost function is symmetric, i.e.,\nc(a, b) = c(b, a). Due to symmetry in costs, without loss of generality, we will assume throughout\nb\u2208B sb. A transport plan is a function\n\u03c3 : A \u00d7 B \u2192 R\u22650 that assigns a non-negative value \u03c3(a, b) to every edge (a, b) with the constraints\n\nthat the total supply is at most the total demand. Let U =(cid:80)\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fthat the total supply coming into any node a \u2208 A is at most da, i.e.,(cid:80)\ntotal supply leaving a node b \u2208 B is at most sb, i.e.,(cid:80)\nplan is one where, for every b \u2208 B,(cid:80)\nThe cost incurred by any transport plan \u03c3, denoted by w(\u03c3) is(cid:80)\n\nb\u2208B \u03c3(a, b) \u2264 da and the\na\u2208A \u03c3(a, b) \u2264 sb. A maximum transport\na\u2208A \u03c3(a, b) = sb, i.e., every available supply is transported.\n(a,b)\u2208A\u00d7B \u03c3(a, b)c(a, b). In the\n\n\u221a\n\ntransportation problem, we wish to compute the minimum-cost maximum transport plan.\nThere are many well-known special versions of the transportation problem. For instance, when all the\ndemands and supplies are positive integers, the problem is the Hitchcock-Koopmans transportation\nproblem. When the demand or supply at every node is 1, the problem becomes the assignment\nproblem. When A and B are discrete probability distributions where each node has an associated\nprobability, the total demand (resp. supply) will equal to 1, i.e., U = 1. This is the problem of\ncomputing optimal transport distance between two distributions. When the cost of transporting\nbetween nodes is a metric, the optimal transport cost is also the Earth Mover\u2019s distance (EMD). If\ninstead, the costs between two nodes is the p-th power of some metric cost with p > 1, the optimal\ntransport cost is also known as the p-Wasserstein\u2019s distance. These special instances are of signi\ufb01cant\ntheoretical interest [1, 17, 20, 24, 28, 29] and also have numerous applications in operations research,\nmachine learning, statistics, and computer vision [5, 6, 7, 10, 13, 27].\nRelated work: There are several combinatorial algorithms for the transportation problem. The\nclassical Hungarian method computes an optimal solution for the assignment problem by using\nlinear programming duality in O(n3) time [18]. In a seminal paper, Gabow and Tarjan applied\nthe cost scaling paradigm and obtained an O(n2.5 log(nC)) time algorithm for the assignment\nproblem [14]. They extended their algorithm to the transportation problem with an execution time of\nO((n2\nU + U log U ) log(nC)); their algorithm requires the demands, supplies and edge costs to be\nintegers. For the optimal transport problem, scaling the demands and supplies to integers will cause\nU to be \u2126(n). Therefore, the execution time of the GT-Algorithm will be \u2126(n2.5). Alternatively, one\ncould use the demand scaling paradigm to obtain an execution time of O(n3 log U ) [12]. For integer\nsupplies and demands, the problem of computing a minimum-cost maximum transport plan can be\nreduced to the problem of computing a minimum-cost maximum \ufb02ow, and applying the result of [21]\ngives a \u02dcO(n2.5poly log(U ))1 time algorithm.\nWe would also like to note that there is an O(n\u03c9C) time algorithm to compute an optimal solution\nfor the assignment problem [23]; here, \u03c9 is the exponent of matrix multiplication time complexity.\nWith the exception of this algorithm, much of the existing work for exact solutions have focused\non design of algorithms that have an execution time polynomial in n, log U, and log C. All existing\nexact solutions, however, are quite slow in practice. This has shifted focus towards the design of\napproximation algorithms.\nFundamentally, there are two types of approximate transport plans that have been considered. We\nrefer to them as \u03b4-approximate and \u03b4-close transport plans and describe them next. Suppose \u03c3\u2217 is a\nmaximum transport plan with the smallest cost. A \u03b4-approximate transport plan is one whose cost is\nwithin (1 + \u03b4)w(\u03c3\u2217), whereas a transport plan \u03c3 is \u03b4-close if its cost is at most an additive value U \u03b4\nlarger than the optimal, i.e., w(\u03c3) \u2264 w(\u03c3\u2217) + U \u03b4. Note that, for discrete probability distributions\nU = 1, and, therefore, a \u03b4-close solution is within an additive error of \u03b4 from the optimal.\nFor metric and geometric costs, there are several algorithms to compute \u03b4-approximate transport\nplans that execute in time near-linear in the input size and polynomial in (log n, log U, log C). For\ninstance, \u03b4-approximate transport plans for the d-dimensional Euclidean assignment problem and\nthe Euclidean transportation problem can be computed in n(log n/\u03b4)O(d) log U time [17, 28]. For\nmetric costs, one can compute a \u03b4-approximate transport plan in \u02dcO(n2) time [1, 30]. There are no\nknown \u03b4-approximation algorithms that execute in near-linear time when the costs are arbitrary.\nThere are several algorithms that return \u03b4-close transport plan for any arbitrary cost function. Among\nthese, the best algorithms take \u02dcO(n2(C/\u03b4)) time. In fact, Blanchet et al. [8] showed that any algorithm\nwith a better dependence on C/\u03b4 can be used to compute a maximum cardinality matching in any\narbitrary bipartite graph in o(n5/2) time. Therefore, design of improved algorithms with sub-linear\ndependence on (C/\u03b4) seems extremely challenging. See Table 1 for a summary of various results\nfor computing a \u03b4-close transport plan. Note that all previous results have one or more factors of\n\n1We use the notation \u02dcO to suppress additional logarithmic terms in n.\n\n2\n\n\fTable 1: A summary of existing algorithms for computing a \u03b4-close transport plan.\n\nAlgorithm\nAltschuler et al., \u201917\nDvurechensky et al., \u201918\nLin et al., \u201919\nQuanrud, \u201919\nBlanchet et al., \u201919\nOur Result\n\nTime Complexity\n\u02dcO(n2(C/\u03b4)3) [3]\n\u221a\n\u02dcO(min(n9/4\nC/\u03b4, n2C/\u03b42) [11] 2\n\u221a\n\u02dcO(min(n2C\n\u03b3/\u03b4, n2(C/\u03b4)2) [22] 2\n\u02dcO(n2C/\u03b4) [25]\n\u02dcO(n2C/\u03b4) [8]\nO(n2C/\u03b4 + n(C/\u03b4)2)\n\nlog n in their execution time. While some of these poly-logarithmic factors are artifacts of worst-case\nanalyses of the algorithms, one cannot avoid them all-together in any practical implementation.\nDue to this, only a small fraction of these results have reasonable implementation that also perform\nwell in practical settings. We would like to highlight the results of Cuturi [9], Altschuler et al. [3],\nand, Dvurechensky et al. [11], all of which are based on the Sinkhorn projection technique. All these\nimplementations, however, suffer from a signi\ufb01cant increase in running time and numerical instability\nfor smaller values of \u03b4. These algorithms are practical only when \u03b4 is moderately large.\nIn an effort to design scalable solutions, researchers have explored several avenues. For instance,\nthere has been an effort to design parallel algorithms [15]. Another avenue for speeding algorithms\nis to exploit the cost structure. For instance, Sinkhorn based algorithms can exploit the structure of\nsquared Euclidean distances leading to a O(n(C/\u03b4)d logd n) time algorithm that produces a \u03b4-close\ntransport plan [4].\nOur results and approach: We present a deterministic primal-dual algorithm to compute a \u03b4-close\nsolution in O(n2(C/\u03b4) + n(C/\u03b4)2) time; note that n2(C/\u03b4) is the dominant term in the execution\ntime provided C/\u03b4 is O(n). Our algorithm is an adaptation of a single scale of Gabow and Tarjan\u2019s\nscaling algorithm for the transportation problem. Our key contribution is a diameter-sensitive analysis\nof this algorithm. The dominant term in the execution time is linear in the size of the cost-matrix and\nlinear in (C/\u03b4). The previous results that achieve such a bound are randomized and have additional\nlogarithmic factors [8, 25], whereas our algorithm does not have any logarithmic factors and is\ndeterministic. Furthermore, we can also exploit the cost structure to improve the execution time of\nour algorithm to \u02dcO(n(C/\u03b4)2) for several geometric costs.\nWe transform our problem to one with integer demands and supplies in O(n2) time (Section 1.1).\nGiven the transformed demands and supplies, our algorithm (in Section 2) scales down the cost of\nevery edge (a, b) to (cid:98) 2c(a,b)\n(1\u2212\u03b5)\u03b4(cid:101) for an arbitrarily small constant 0 < \u03b5 < 1, and executes, at most\n(cid:98)2C/((1 \u2212 \u03b5)\u03b4)(cid:99) + 1 phases. Within each phase, the algorithm executes two steps. The \ufb01rst step\n(also called the Hungarian Search) executes Dijkstra\u2019s algorithm (O(n2)) and adjusts the weights\ncorresponding to a dual linear program to \ufb01nd an augmenting path consisting of zero slack edges.\nThe second step executes DFS from every node with an excess supply and \ufb01nds augmenting paths\nof zero slack edges to route these supplies. The time taken by this step is O(n2) for the search\nand an additional time proportional to the sum of the lengths of all the augmenting paths found\nby the algorithm. We bound this total length of paths found during the course of the algorithm by\nO(n/\u03b5(1 \u2212 \u03b5)(C/\u03b4)2).\nComparison with Gabow-Tarjan: Our algorithm can be seen as executing a single scale of Gabow\nand Tarjan\u2019s algorithm for carefully scaled integer demand, integer supply, and integer cost functions.\nLet U be the total integer supply. Our analysis differs from Gabow and Tarjan\u2019s analysis in the\nfollowing ways. Gabow and Tarjan\u2019s algorithm computes an optimal solution only when the total\nsupply, i.e., U is equal to total demand. In fact, there has been substantial effort in extending it to the\nunbalanced case [26]. Our transformation to integer demands and supply in Section 1.1 makes the\nproblem inherently unbalanced. However, we identify the fact that the dif\ufb01culty with unbalanced\ndemand and supply exists only when the algorithm executes multiple scales. We provide a proof that\nour algorithm works for the unbalanced case (see Lemma 2.1). To bound the number of phases by\n\n2These results have an additional data dependent parameter in the running time. For the result in Lin et al.,\n\n\u03b3 = O(n) is this parameter.\n\n3\n\n\f\u221aU) and the length of the augmenting paths by O(U log U), Gabow and Tarjan\u2019s proof requires\nO(\nthe optimal solution to be of O(U) cost. We use a very different argument to bound the number of\nphases. Our proof (see Lemma 2.3 and Lemma 2.4) is direct and does not have any dependencies on\nthe cost of the optimal solution.\nExperimental evaluations: We contrast the empirical performance of our algorithm with the theo-\nretical bounds presented in this paper and observe that the total length of the augmenting paths is\nsubstantially smaller than the worst-case bound of n(C/\u03b4)2. We also compare our implementation\nwith sequential implementations of Sinkhorn projection-based algorithms on real-world data. Our\nalgorithm is competitive with respect to other methods for moderate and large values of \u03b4. Moreover,\nunlike Sinkhorn projection based approaches, our algorithm is numerically stable and executes\nef\ufb01ciently for smaller values of \u03b4. We present these comparisons in Section 3.\nExtensions: In Section 4, we discuss a faster implementation of our algorithm using a dynamic\nweighted nearest neighbor data structure with an execution time of \u02dcO(n(C/\u03b4)\u03a6(n)). Here, \u03a6(n) is\nthe query and update time of this data structure. As consequences, we obtain an \u02dcO(n(C/\u03b4)) time\nalgorithm to compute \u03b4-close optimal transport for several settings including when A, B \u2282 R2 and\nthe costs are Euclidean distances and squared-Euclidean distances.\n\n1.1 Scaling demands and supplies\n\nIn this section, we transform the demands and supplies to integer demands and supplies. By doing\nso, we are able to apply the traditional framework of augmenting paths and \ufb01nd an approximate\nsolution to the transformed problem in O( n2C\n\u03b42 ) time. Finally, this solution is mapped to a\nfeasible solution for the original demands and supplies. The total loss in accuracy in the cost due to\nthis transformation is at most \u03b5U \u03b4.\n\u03b5U \u03b4 . Let I be the input instance for the transportation problem with each\nLet 0 < \u03b5 < 1. Set \u03b1 = 2nC\ndemand location a \u2208 A having a demand of da and each supply location b \u2208 B having a supply of\nsb. We create a new input instance I(cid:48) by scaling the demand at each node a \u2208 A to da = (cid:100)da\u03b1(cid:101) and\nb\u2208B sb. Since\n\nscaling the supply at each node b \u2208 B to sb = (cid:98)sb\u03b1(cid:99). Let the total supply be U =(cid:80)\n\n\u03b4 + nC2\n\nwe scale the supplies by \u03b1 and round them down, we have\n(cid:98)sb\u03b1(cid:99) \u2264 \u03b1\n\nU =\n\nsb =\n\n(cid:88)\n\nb\u2208B\n\n(cid:88)\n\nb\u2208B\n\n(cid:88)\n\nb\u2208B\n\nsb = \u03b1U.\n\n(1)\n\nRecollect that for any input I to the transportation problem, the total supply is no more than the total\ndemand. Since the new supplies are scaled by \u03b1 and rounded down whereas the new demands are\nscaled by \u03b1 and rounded up, the total supplies in I(cid:48) remains no more than the total demand. Let \u03c3(cid:48)\nbe any feasible maximum transport plan for I(cid:48). Now consider a transport plan \u03c3 that sets, for each\nedge (a, b), \u03c3(a, b) = \u03c3(cid:48)(a, b)/\u03b1. As described below, the transport plan \u03c3 is not necessarily feasible\nor maximum for I.\n\nof any node b \u2208 B is(cid:80)\nthan da, i.e.,(cid:80)\n\n(i) \u03c3 is not necessarily a maximum transport plan for I since the total supplies transported out\na\u2208A \u03c3(cid:48)(a, b)/\u03b1 = (cid:98)\u03b1sb(cid:99)/\u03b1 \u2264 sb. Note that the\n(ii) \u03c3 is not a feasible plan for I since the total demand met at any node a \u2208 A can be more\nb\u2208B \u03c3(cid:48)(a, b)/\u03b1 = da/\u03b1 = (cid:100)\u03b1da(cid:101)/\u03b1 \u2265 da. Note that the\n\nexcess supply remaining at any node b \u2208 B is \u03bab = sb \u2212 (cid:98)\u03b1sb(cid:99)/\u03b1 \u2264 1/\u03b1.\n\na\u2208A \u03c3(a, b) =(cid:80)\n\nb\u2208B \u03c3(a, b) =(cid:80)\n\nexcess supply that reaches node a \u2208 A, \u03baa \u2264 (cid:100)\u03b1da(cid:101)/\u03b1 \u2212 da \u2264 \u03b1da+1\n\n\u03b1 \u2212 da = 1/\u03b1.\n\nThe cost of \u03c3, w(\u03c3) = w(\u03c3(cid:48))/\u03b1. We can convert \u03c3 to a feasible and maximum transport plan for I\nin two steps.\nFirst, one can convert \u03c3 to a feasible solution. The excess supply \u03baa that reaches a demand node\na \u2208 A can be removed by iteratively picking an arbitrary edge incident on a, say the edge (a, b), and\nreducing \u03c3(a, b) as well as \u03baa by min{\u03baa, \u03c3(a, b)}. This iterative process is applied until \u03baa reduces\nto 0. This step is also repeated at every demand node a \u2208 A with an \u03baa > 0. The total excess supply\na\u2208A \u03baa \u2264 n/\u03b1. Combined\nwith the left-over supply from (i), the total remaining supply in \u03c3 is at most 2n/\u03b1. \u03c3 is now a feasible\ntransportation plan with an excess supply of at most 2n/\u03b1. Since the supplies transported along\nedges only reduce, the cost w(\u03c3) \u2264 w(\u03c3(cid:48))/\u03b1.\n\npushed back will increase the leftover supply at the supply nodes by(cid:80)\n\n4\n\n\fSecond, to convert this feasible plan \u03c3 to a maximum transport plan, one can simply match the\nremaining 2n/\u03b1 supplies arbitrarily to leftover demands at a cost of at most C per unit of supply.\nThe cost of this new transport plan increases by at most 2nC/\u03b1 and so,\n\u2264 w(\u03c3(cid:48))/\u03b1 + \u03b5U \u03b4.\n\nw(\u03c3) \u2264 w(\u03c3(cid:48))/\u03b1 +\n\n2nC\n\n(2)\n\n\u03b1\n\nRecollect that \u03c3\u2217 is the optimal solution for I. Let \u03c3(cid:48)\nOPT be the optimal solution for input instance\nI(cid:48). In Lemma 1.1 (proof of which is in the supplement), we show that w(\u03c3(cid:48)\nOPT) \u2264 \u03b1w(\u03c3\u2217). In the\nOPT) + (1 \u2212 \u03b5)U\u03b4,\nSection 2, we show how to construct a transport plan \u03c3(cid:48) with a cost w(\u03c3(cid:48)) \u2264 w(\u03c3(cid:48)\nwhich from Lemma 1.1, can be rewritten as w(\u03c3(cid:48)) \u2264 \u03b1w(\u03c3\u2217) + (1 \u2212 \u03b5)U\u03b4. By combining this with\nequations (1) and (2), the solution produced by our algorithm is w(\u03c3) \u2264 w(\u03c3\u2217) + (1 \u2212 \u03b5)U\u03b4/\u03b1 +\n\u03b5U \u03b4 \u2264 w(\u03c3\u2217) + (1 \u2212 \u03b5)U \u03b4 + \u03b5U \u03b4 = w(\u03c3\u2217) + U \u03b4.\nLemma 1.1. Let \u03b1 > 0, be a parameter. Let I be the original instance of the transportation problem\nand let I(cid:48) be an instance scaled by \u03b1. Let \u03c3\u2217 be the minimum-cost maximum transport plan for I\nand let \u03c3(cid:48)\n\nOPT be an minimum-cost maximum transport plan for I(cid:48). Then w(\u03c3(cid:48)\n\nOPT) \u2264 \u03b1w(\u03c3\u2217).\n\nOPT\n\n(1\u2212\u03b5)\u03b4 + nC2\n\nOPT) + (1 \u2212 \u03b5)\u03b4U in O( n2C\n\nb\u2208B \u03c3(a, b) > 0 (resp. sb \u2212(cid:80)\n\nwith respect to a transportation plan \u03c3 if da \u2212(cid:80)\n\n2 Algorithm for scaled demands and supplies\nThe input I(cid:48) consists of a set of demand nodes A with demand of da for each node a \u2208 A and a set of\nsupply nodes B with supply of sb for each node b \u2208 B along with the cost matrix as input. Let \u03c3(cid:48)\nbe the optimal transportation plan for I(cid:48). In this section, we present a variant of Gabow and Tarjan\u2019s\nalgorithm that produces a plan \u03c3(cid:48) with w(\u03c3(cid:48)) \u2264 w(\u03c3(cid:48)\n\u03b5(1\u2212\u03b5)\u03b42 )\ntime. We obtain our result by setting \u03b5 to be a constant such as \u03b5 = 0.5.\nDe\ufb01nitions and notations: Let \u03b4(cid:48) = (1 \u2212 \u03b5)\u03b4. We say that a vertex a \u2208 A (resp. b \u2208 B) is free\na\u2208A \u03c3(a, b) > 0).\nAt any stage in our algorithm, we use AF (resp. BF ) to denote the set of free demand nodes (resp.\nsupply nodes). Let c(a, b) = (cid:98)2c(a, b)/\u03b4(cid:48)(cid:99) be the scaled cost of any edge (a, b). Recollect that w(\u03c3)\nis the cost of any transport plan \u03c3 with respect to c(\u00b7,\u00b7). Similarly, we use w(\u03c3) to denote the cost of\nany transport plan with respect to the c(\u00b7,\u00b7).\nThis algorithm is based on a primal-dual approach. The algorithm, at all times, maintains a transport\nplan that satis\ufb01es the dual feasibility conditions. Given a transport plan \u03c3 along with a dual weight\ny(v) for every v \u2208 A \u222a B, we say that \u03c3, y(\u00b7) is 1-feasible if, for any two nodes a \u2208 A and b \u2208 B,\n(3)\n(4)\nThese feasibility conditions are identical to the one introduced by Gabow and Tarjan but for costs\nthat are scaled by 2/\u03b4(cid:48) and rounded down. We refer to a 1-feasible transport plan that is maximum as\na 1-optimal transport plan. Note that Gabow-Tarjan\u2019s algorithm is de\ufb01ned for balanced transportation\nproblem and so a maximum transport plan will also satisfy all demands. However, in our case there\nmay still be unsatis\ufb01ed demands. To handle them, we introduce the following additional condition.\nConsider any 1-optimal transport plan \u03c3 such that for every demand node a \u2208 A,\n(C) The dual weight y(a) \u2264 0 and, if a is a free demand node, then y(a) = 0.\n\ny(a) + y(b) \u2264 c(a, b) + 1\ny(a) + y(b) \u2265 c(a, b)\n\nif \u03c3(a, b) < min{sb, da}\nif \u03c3(a, b) > 0.\n\nOPT) + \u03b4(cid:48)U.\n\nOPT be a minimum cost maximum transport plan. Then, w(\u03c3) \u2264 w(\u03c3(cid:48)) + \u03b4(cid:48)U.\n\nIn Lemma 2.1, we show that any 1-optimal transport plan \u03c3 with dual weights y(\u00b7) satisfying (C) has\nthe desired cost bound, i.e., w(\u03c3) \u2264 w(\u03c3(cid:48)\nLemma 2.1. Let \u03c3 along with dual weights y(\u00b7) be a 1-optimal transport plan that satis\ufb01es (C). Let\n\u03c3(cid:48) = \u03c3(cid:48)\nIn the rest of this section, we describe an algorithm to compute a 1-optimal transport plan that satis\ufb01es\n(C). To assist in describing this algorithm, we introduce a few de\ufb01nitions.\nFor any 1-feasible transport plan \u03c3, we construct a directed residual graph with the vertex set A \u222a B\n\u2212\u2192\nG \u03c3 is de\ufb01ned as follows: For any (a, b) \u2208 A\u00d7B if \u03c3(a, b) = 0,\nand denote it by\nwe add an edge directed from b to a and set its residual capacity to be min{da, sb}. Otherwise, if\n\u03c3(a, b) = min{da, sb}, we add an edge from a to b with a residual capacity of \u03c3(a, b). In all other\n\n\u2212\u2192\nG \u03c3. The edge set of\n\n5\n\n\fcases, i.e., 0 < \u03c3(a, b) < min{da, sb}, we add an edge from a to b with a residual capacity of \u03c3(a, b)\n\u2212\u2192\nand an edge from b to a with a residual capacity of min{da, sb} \u2212 \u03c3(a, b). Any edge of\nG \u03c3 directed\nfrom a \u2208 A to b \u2208 B is called a backward edge and any edge directed from b \u2208 B to a \u2208 A is\ncalled a forward edge. We set the cost of any edge between a and b regardless of their direction to\nbe c(a, b) = (cid:98)2c(a, b)/\u03b4(cid:48)(cid:99). Any directed path in the residual network starting from a free supply\nvertex to a free demand vertex is called an augmenting path. Note that the augmenting path alternates\nbetween forward and backward edges with the \ufb01rst and the last edge of the path being a forward edge.\nWe can augment the supplies transported by k \u2265 1 units along an augmenting path P as follows.\nFor every forward edge (a, b) on the path P , we raise the \ufb02ow \u03c3(a, b) \u2190 \u03c3(a, b) + k. For every\nbackward edge (a, b) on the path P , we reduce the \ufb02ow \u03c3(a, b) \u2190 \u03c3(a, b) \u2212 k. We de\ufb01ne slack on\nany edge between a and b in the residual network as\n\ns(a, b) = c(a, b) + 1 \u2212 y(a) \u2212 y(b)\ns(a, b) = y(a) + y(b) \u2212 c(a, b)\n\nif (a, b) is a forward edge,\nif (a, b) is a backward edge\n\n(5)\n(6)\n\n\u2212\u2192A \u03c3 is\n\nFinally, we de\ufb01ne any edge (a, b) in\nthe subgraph of\n\n\u2212\u2192\nG \u03c3 consisting of the admissible edges of the residual graph.\n\n\u2212\u2192\nG \u03c3 as admissible if s(a, b) = 0. The admissible graph\n\nwith(cid:80)\n\n2.1 The algorithm\nInitially \u03c3 is a transport plan where, for every edge (a, b) \u2208 A \u00d7 B, \u03c3(a, b) = 0. We set the dual\nweights of every vertex v \u2208 A \u222a B to 0, i.e., y(v) = 0. Note that \u03c3 and y(\u00b7) together form a\n1-feasible transportation plan. Our algorithm executes in phases and terminates when \u03c3 becomes\na maximum transport plan. Within each phase there are two steps. In the \ufb01rst step, the algorithm\nconducts a Hungarian Search and adjusts the dual weights so that there is at least one augmenting\npath of admissible edges. In the second step, the algorithm computes at least one augmenting path\nand updates \u03c3 by augmenting it along all paths computed. At the end of the second step, we guarantee\nthat there is no augmenting path of admissible edges. The details are presented next.\nFirst step (Hungarian Search): To conduct a Hungarian Search, we add two additional vertices s\nand t to the residual network. We add edges directed from s to every free supply node, i.e., nodes\na\u2208A \u03c3(a, b) < sb. We add edges from every free demand vertex to t. All edges incident on s\nand t are given a weight 0. The weight of every other edge (a, b) of the residual network is set to its\nslack s(a, b) based on its direction. We refer to the residual graph with the additional two vertices\nas the augmented residual network and denote it by G\u03c3. We execute Dijkstra\u2019s algorithm from s in\nthe augmented residual network G\u03c3. For any vertex v \u2208 A \u222a B, let (cid:96)v be the shortest path from s\nto v in G\u03c3. Next, the algorithm performs a dual weight adjustment. For any vertex v \u2208 A \u222a B, if\n(cid:96)v \u2265 (cid:96)t, the dual weight of v remains unchanged. Otherwise, if (cid:96)v < (cid:96)t, we update the dual weight\nas follows: (U1): If v \u2208 A, we set y(v) \u2190 y(v) \u2212 (cid:96)t + (cid:96)v, (U2): Otherwise, if v \u2208 B, we set\ny(v) \u2190 y(v) + (cid:96)t \u2212 (cid:96)v.\nThis completes the description of the \ufb01rst step of the algorithm. The dual updates guarantee that, at\nthe end of this step, the transport plan \u03c3 along with the updated dual weights remain 1-feasible and\nthere is at least one augmenting path in the admissible graph.\n\nSecond step (partial DFS): Initially A is set to the admissible graph, i.e., A \u2190 \u2212\u2192A \u03c3. Let X denote\nthe set of free supply nodes in A. The second step of the algorithm will iteratively initiate a DFS\nfrom each supply node of X in the graph A. We describe the procedure for one free supply node\nb \u2208 X. During the execution of DFS from b, if a free demand node is visited, then an augmenting\npath P is found, the DFS terminates immediately, and the algorithm deletes all edges visited by the\nDFS, except for the edges of P . The AUGMENT procedure augments \u03c3 along P and updates X to\ndenote the free supply nodes in A. Otherwise, if the DFS ends without \ufb01nding an augmenting path,\nthen the algorithm deletes all vertices and edges that were visited by the DFS from A and updates X\nto represent the free supply nodes remaining in A. The second step ends when X becomes empty.\nAugment procedure: For any augmenting path P starting at a free supply vertex b \u2208 BF and\nending at a free demand vertex a \u2208 AF , its bottleneck edge set is the set of all edges (u, v) on P\nwith the smallest residual capacity. Let bc(P ) denote the capacity of any edge in the bottleneck\nedge set. The bottleneck capacity rP of P is the smallest of the total remaining supply at b, the\ntotal remaining demand at a, and the residual capacity of its bottleneck edge, i.e., rP = min{sb \u2212\nb(cid:48)\u2208B \u03c3(a, b(cid:48)), bc(P )}. The algorithm augments along P by updating \u03c3 as\n\n(cid:80)\na(cid:48)\u2208A \u03c3(a(cid:48), b), da \u2212(cid:80)\n\n6\n\n\ffollows. For every forward edge (a(cid:48), b(cid:48)), we set \u03c3(a(cid:48), b(cid:48)) \u2190 \u03c3(a(cid:48), b(cid:48)) + rP , and, for every backward\nedge (a(cid:48), b(cid:48)), \u03c3(a(cid:48), b(cid:48)) \u2190 \u03c3(a(cid:48), b(cid:48)) \u2212 rP . The algorithm then updates the residual network and the\nadmissible graph to re\ufb02ect the new transport plan.\nInvariants: The following invariants (proofs of which are in the supplement) hold during the\nexecution of the algorithm. (I1): The algorithm maintains a 1-feasible transport plan, and, (I2) In\neach phase, the partial DFS step computes at least one augmenting path. Furthermore, at the end of\nthe partial DFS, there is no augmenting path in the admissible graph.\nCorrectness: From (I2), the algorithm augments, in each phase, the transport plan by at least one\nunit of supply. Therefore, when the algorithm terminates we have a 1-feasible (from (I1)) maximum\ntransport plan, i.e., 1-optimal transport plan. Next, we show that any transport plan maintained by the\nalgorithm will satisfy condition (C). For v \u2208 A, initially y(v) = 0. In any phase, suppose (cid:96)v < (cid:96)t.\nThen, the Hungarian Search updates the dual weights using condition (U1) which reduces the dual\nweight of v. Therefore, y(v) \u2264 0.\nNext, we show that all free vertices of A have a dual weight of 0. The claim is true initially. During\nthe course of the algorithm, any vertex a \u2208 A whose demand is met can no longer become free.\nTherefore, it is suf\ufb01cient to argue that no free demand vertex experiences a dual adjustment. By\nconstruction, there is a directed edge from v to t with zero cost in G\u03c3. Therefore, (cid:96)t \u2264 (cid:96)v and the\nalgorithm will not update the dual weight of v during the phase. As a result the algorithm maintains\ny(v) = 0 for every free demand vertex and (C) holds.\nWhen the algorithm terminates, we obtain a 1-optimal transport plan \u03c3 which satis\ufb01es (C). From\nOPT) + U\u03b4(cid:48) as desired. The following lemma helps in\nLemma 2.1, it follows that w(\u03c3) \u2264 w(\u03c3(cid:48)\nachieving a diameter sensitive analysis of our algorithm.\nLemma 2.2. The dual weight of any free supply node v \u2208 BF is at most (cid:98)2C/\u03b4(cid:48)(cid:99) + 1.\n\nProof. For the sake of contradiction, suppose the free supply node v \u2208 BF has a dual weight\ny(b) \u2265 (cid:98)2C/\u03b4(cid:48)(cid:99) + 2. For any free demand node, say a \u2208 AF has a dual weight y(a) = 0 (from\n(C)). Then, y(a) + y(b) \u2265 (cid:98)2C/\u03b4(cid:48)(cid:99) + 2 \u2265 c(a, b) + 2, and the edge (a, b) violates 1-feasibility\ncondition (3) leading to a contradiction.\n\nEf\ufb01ciency: Let Pj be the set of all augmenting paths computed in phase j and let P be the set of\nall augmenting paths computed by the algorithm across all phases. To bound the execution time\nof the algorithm, we bound, in Lemma 2.3, the total number of phases by (cid:98)2C/\u03b4(cid:48)(cid:99) + 1. Within\neach phase, the Hungarian search step executes a single Dijkstra search which takes O(n2) time. To\nbound the time taken by the partial DFS step, observe that any edge visited by the DFS is deleted\nprovided it does not lie on an augmenting path. Edges that lie on an augmenting path, however, can\nbe visited again by another DFS within the same phase. Therefore, the total time taken by the partial\n|P|); here |P| is the number of edges on the\nP\u2208P |P|).\n\u03b5(1\u2212\u03b5) (C/\u03b4)2). Therefore,\n\nDFS step in any phase j is bounded by O(n2 +(cid:80)\naugmenting path P . Across all O(C/\u03b4(cid:48)) phases, the total time taken is O((C/\u03b4(cid:48))n2 +(cid:80)\n\nIn Lemma 2.4, we bound the total length of the augmenting paths by O(\nthe total execution time of the algorithm is O( n2C\nLemma 2.3. The total number of phases in our algorithm is at most (cid:98)2C/\u03b4(cid:48)(cid:99) + 1.\n\n(1\u2212\u03b5)\u03b4 + nC2\n\n\u03b5(1\u2212\u03b5)\u03b42 ).\n\nP\u2208Pj\n\nn\n\nProof. At the start of any phase, from (I2), there are no admissible augmenting paths. Therefore,\nany path from s to t in the augmented residual network G\u03c3 will have a cost of at least 1, i.e., (cid:96)t \u2265 1.\nDuring any phase, let b \u2208 BF be any free supply vertex. Note that b is also a free supply vertex in\nall prior phases. Since there is a direct edge from s to b with a cost of 0 in A, (cid:96)b = 0. Since (cid:96)t \u2265 1,\nfrom (U2), the dual weight of b increases by at least 1. After (cid:98)2C/\u03b4(cid:48)(cid:99) + 2 phases, the dual weight of\nany free vertex will be at least (cid:98)2C/\u03b4(cid:48)(cid:99) + 2 which contradicts Lemma 2.2.\n\nLemma 2.4. Let P be the set of all augmenting paths produced by the algorithm. Then(cid:80)\n\nP\u2208P |P| =\n\nO( nC2\n\n\u03b5(1\u2212\u03b5)\u03b42 ); here |P| is the number of edges on the path P .\n\n7\n\n\fFigure 1: Ef\ufb01ciency statistics for our algorithm when executed on very small \u03b4 values.\n\n3 Experimental Results\n\nIn this section, we investigate the practical performance of our algorithm. We test an implementation\nof our algorithm3, written in Java, on discrete probability distributions derived from real-world\nimage data. The testing code is written in MATLAB, and calls the compiled Java code. All tests are\nexecuted on computer with a 2.40 GHz Intel Dual Core i5 processor and 8GB of RAM, using a single\ncomputation thread. We implement a worst-case asymptotically optimal algorithm as presented in this\npaper and \ufb01x the value of \u03b5 as 0.5. We compare our implementation to existing implementations of\nthe Sinkhorn, Greenkhorn4, and APDAGD5 algorithms, all of which are written in MATLAB. Unless\notherwise stated, we set all parameters of these algorithms to values prescribed by the theoretical\nanalysis presented in the respective papers.\nAll the tests are conducted on real-world data generated by randomly selecting pairs of images from\nthe MNIST data set of handwritten digits. We set supplies and demands based on pixel intensities,\nand normalize such that the total supply and demand are both equal to 1. The cost of an edge is\nassigned based on squared-Euclidean distance between pixel coordinates, and costs are scaled such\nthat the maximum edge cost C = 1. The MNIST images are 28 \u00d7 28 pixels, implying each image\nhas 784 pixels.\nIn the \ufb01rst set of tests, we compare the empirical performance of our algorithm to its theoretical\nanalysis from Section 2. We execute 100 runs, where each run executes our algorithm on a randomly\nselected pair of MNIST images for \u03b4 \u2208 [0.0001, 0.1]. For each value of \u03b4, we record the wall-\nclock running time, iteration count, and total augmenting path length of our algorithm. These\nvalues, averaged over all runs, are plotted in Figure 1. We observe that the number of iterations\nis signi\ufb01cantly less than the theoretical bound of roughly\n(1\u2212\u03b5)\u03b4 = 4/\u03b4. We also observe that the\ntotal augmenting path length is signi\ufb01cantly smaller than the worst case bound of n/\u03b42 for even for\nthe very small \u03b4 value of 0.0001. This is because of two reasons. First, the inequality (8) (Proof of\nLemma 2.4 in the supplement) used in bounding the augmenting path length is rarely tight; most\naugmenting paths have large slack with respect to this inequality. Second, to obtain the worst case\nbound, we assume that only one unit of \ufb02ow is pushed through each augmenting path (see Proof\nof Lemma 2.4). However, in our experiments augmenting paths increased \ufb02ow by a larger value\nwhenever possible. As a result, we noticed that the total time taken by augmentations, even for the\nsmallest value of \u03b4, was negligible and \u2264 2% of the execution time.\nNext, we compare the number of iterations executed by our algorithm with the number of iterations\nexecuted by the Sinkhorn, Greenkhorn, and APDAGD algorithms. Here, by \u2018iteration\u2019 we refer to the\nthe logical division of each algorithm into portions that take O(n2) time. For example, an iteration\nof the Greenkhorn algorithm corresponds to n row/column updates. An iteration for our algorithm\ncorresponds to a single phase. We execute 10 runs, where each run selects a random pair of MNIST\nimages and executes all four algorithms using \u03b4 values in the range [0.025, 0.2].\nFigure 2(a) depicts the average number of iterations for each algorithm. For the Sinkhorn, Greenkhorn\nand APDAGD algorithms, we see a signi\ufb01cant increase in iterations as we choose smaller values of \u03b4.\nWe believe this is because of numerical instability associated with these algorithms. Unlike these\nalgorithms, our algorithm runs fairly quickly for very small values of \u03b4 (see Figure 1).\n\n2\n\n3Our implementation is available at https://github.com/nathaniellahn/CombinatorialOptimalTransport.\n4Sinkhorn/Greenkhorn implementations retrieved from https://github.com/JasonAltschuler/OptimalTransportNIPS17.\n5APDAGD implementation retrieved from https://github.com/chervud/AGD-vs-Sinkhorn.\n\n8\n\n\f(a)\n\n(b)\n\nFigure 2: (a) A comparison of the number of iterations executed by various algorithms for moderate\nvalues of \u03b4; (b) A comparison of our algorithm with the Sinkhorn algorithm using several \u03b4 values.\nWe compare running times when both algorithms receive \u03b4 as input (left) and compare the running\ntimes when Sinkhorn receives 5\u03b4 and our algorithm receives \u03b4 (right).\n\nWe repeat the test from Figure 2(a), and compare the average time taken to produce the result. The\ncode for Greenkhorn and APDAGD are not optimized for actual running time, so we restrict to\ncomparing our algorithm with the Sinkhorn algorithm, and record average wall-clock running time.\nThe results of executing 100 runs are plotted on the left in Figure 2(b). Under these conditions, we\nobserve that the time taken by our algorithm is signi\ufb01cantly less than that taken by the Sinkhorn\nalgorithm. The cost of the solution produced by our algorithm, although within the error parameter \u03b4,\nwas not always better than that of the cost of the solution produced by Sinkhorn. We repeat the same\nexperimental setup, except with Sinkhorn receiving an error parameter of 5\u03b4 while our algorithm\ncontinues to receive \u03b4. Our algorithm continues to have a better average execution time than Sinkhorn\n(plot on the right in Figure 2(b)). Moreover, we also observe that cost of solution produced by\nSinkhorn is higher than the solution produced by our algorithm for every run across all values of \u03b4. In\nother words, for this experiment, our algorithm executed faster and produced a better quality solution\nthan the Sinkhorn algorithm.\n\n4 Extensions and Conclusion\nWe presented an O(n2(C/\u03b4) + n(C/\u03b4)2) time algorithm to compute a \u03b4-close approximation of the\noptimal transport. Our algorithm is an execution of a single scale of Gabow and Tarjan\u2019s algorithm\nfor appropriately scaled integer demands, supplies and costs. Our key technical contribution is a\ndiameter sensitive analysis of the execution time of this algorithm.\nIn [29, Section 3.1, 3.2], it has been shown that the \ufb01rst and the second steps of our algorithm,\ni.e., Hungarian search and partial DFS, can be executed on a residual graph with costs c(\u00b7,\u00b7) in\nO(n\u03a6(n) log2 n) and O(n\u03a6(n)) time respectively; here \u03a6(n) is the query and update time of a\ndynamic weighted nearest neighbor data structure with respect to the cost function c(\u00b7,\u00b7). Unlike\nin [29] where the costs, after scaling down by a factor \u03b4(cid:48), are rounded up, c(\u00b7,\u00b7) is obtained by\nscaling down c(\u00b7,\u00b7) by \u03b4(cid:48), and is rounded down. This, however, does not affect the critical lemma [29,\nLemma 3.2] and the result continues to hold. Several distances including the Euclidean distance and\nthe squared-Euclidean distance admit such a dynamic weighted nearest neighbor data structure for\nplanar point sets with poly-logarithmic query and update time [2, 16]. Therefore, we immediately\nobtain a \u02dcO(n(C/\u03b4)2) time algorithm to compute a \u03b4-close approximation of the optimal transport\nfor such distances. In [1, Section 4 (i)\u2013(iii)], a similar link is made between an approximate nearest\nneighbor (ANN) data structure and a relative approximation algorithm.\nThe Sinkhorn algorithm scales well in parallel settings because the row and column update operations\nwithin each iteration can be easily parallelized. In the \ufb01rst step, our algorithm uses Dijkstra\u2019s method\nthat is inherently sequential. However, there is an alternate implementation of a single scale of Gabow\nand Tarjan\u2019s algorithm that does not require Dijkstra\u2019s search (see [19, Section 2.1]) and may be more\namenable to parallel implementation. We conclude with the following open questions:\n\n\u2022 Can we design a combinatorial algorithm that uses approximate nearest neighbor data\nstructure and produces a \u03b4-close transport plan for geometric costs in \u02dcO(n(C/\u03b4)d) time?\n\u2022 Can we design a parallel combinatorial approximation algorithm that produces a \u03b4-close\n\noptimal transport for arbitrary costs?\n\n9\n\n\fAcknowledgements: Research presented in this paper was funded by NSF CCF-1909171. We would\nlike to thank the anonymous reviewers for their useful feedback. All authors contributed equally to\nthis research.\n\nReferences\n[1] P. K. Agarwal and R. Sharathkumar. Approximation algorithms for bipartite matching with\nmetric and geometric costs. In ACM Symposium on Theory of Computing, pages 555\u2013564, 2014.\n\n[2] P. K. Agarwal, A. Efrat, and M. Sharir. Vertical decomposition of shallow levels in 3-dimensional\n\narrangements and its applications. SIAM J. Comput., 29(3):912\u2013953, 1999.\n\n[3] J. Altschuler, J. Weed, and P. Rigollet. Near-linear time approximation algorithms for optimal\ntransport via sinkhorn iteration. In Neural Information Processing Systems, pages 1961\u20131971,\n2017.\n\n[4] J. Altschuler, F. Bach, A. Rudi, and J. Weed. Approximating the quadratic transportation metric\n\nin near-linear time. arxiv:1810.10046 [cs.DS], 2018.\n\n[5] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. arXiv:1701.07875v3 [stat.ML],\n\n2017.\n\n[6] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyr\u00e9. Iterative bregman projections\nfor regularized transportation problems. SIAM Journal on Scienti\ufb01c Computing, 37(2):A1111\u2013\nA1138, 2015.\n\n[7] J. Bigot, R. Gouet, T. Klein, A. L\u00f3pez, et al. Geodesic PCA in the wasserstein space by convex\nPCA. In Annales de l\u2019Institut Henri Poincar\u00e9, Probabilit\u00e9s et Statistiques, volume 53, pages\n1\u201326. Institut Henri Poincar\u00e9, 2017.\n\n[8] J. Blanchet, A. Jambulapati, C. Kent, and A. Sidford. Towards optimal running times for optimal\n\ntransport. arXiv:1810.07717 [cs.DS], 2018.\n\n[9] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport.\n\nInformation Processing Systems, pages 2292\u20132300, 2013.\n\nIn Neural\n\n[10] M. Cuturi and A. Doucet. Fast computation of wasserstein barycenters.\n\nConference on Machine Learning, pages 685\u2013693, 2014.\n\nIn International\n\n[11] P. Dvurechensky, A. Gasnikov, and A. Kroshnin. Computational optimal transport: Complex-\nity by accelerated gradient descent is better than by sinkhorn\u2019s algorithm. In International\nConference on Machine Learning, pages 1366\u20131375, 2018.\n\n[12] J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic ef\ufb01ciency for network\n\n\ufb02ow problems. J. ACM, 19(2):248\u2013264, 1972.\n\n[13] R. Flamary, M. Cuturi, N. Courty, and A. Rakotomamonjy. Wasserstein discriminant analysis.\n\nMachine Learning, 107(12):1923\u20131945, 2018.\n\n[14] H. N. Gabow and R. Tarjan. Faster scaling algorithms for network problems. SIAM J. Comput.,\n\n18:1013\u20131036, October 1989.\n\n[15] A. Jambulapati, A. Sidford, and K. Tian. A direct o(1/\u0001) iteration parallel algorithm for optimal\n\ntransport. arxiv 1906.00618 [cs.DS], 2019.\n\n[16] H. Kaplan, W. Mulzer, L. Roditty, P. Seiferth, and M. Sharir. Dynamic planar voronoi diagrams\nfor general distance functions and their algorithmic applications. In ACM/SIAM Sympos. on\nDiscrete Algorithms, pages 2495\u20132504, 2017.\n\n[17] A. B. Khesin, A. Nikolov, and D. Paramonov. Preconditioning for the geometric transportation\n\nproblem. In Symposium on Computational Geometry, pages 15:1\u201315:14, 2019.\n\n[18] H. Kuhn. Variants of the hungarian method for assignment problems. Naval Research Logistics,\n\n3(4):253\u2013258, 1956.\n\n10\n\n\f[19] N. Lahn and S. Raghvendra. A faster algorithm for minimum-cost bipartite matching in\n\nminor-free graphs. In ACM/SIAM Sympos. on Discrete Algorithms, pages 569\u2013588, 2019.\n\n[20] N. Lahn and S. Raghvendra. A weighted approach to the maximum cardinality bipartite\nmatching problem with applications in geometric settings. In Symposium on Computational\nGeometry, pages 48:1\u201348:13, 2019.\n\n[21] Y. T. Lee and A. Sidford. Path \ufb01nding methods for linear programming: Solving linear programs\nin \u02dcO(vrank) Iterations and Faster Algorithms for Maximum Flow. In IEEE Foundations of\nComputer Science, pages 424\u2013433, 2014.\n\n[22] T. Lin, N. Ho, and M. I. Jordan. On ef\ufb01cient optimal transport: An analysis of greedy and\n\naccelerated mirror descent algorithms. arXiv:1901.06482 [cs.DS], 2019.\n\n[23] M. Mucha and P. Sankowski. Maximum matchings via gaussian elimination. In IEEE Founda-\n\ntions of Computer Science, pages 248\u2013255, 2004.\n\n[24] J. M. Phillips and P. K. Agarwal. On bipartite matching under the RMS distance. In Canadian\n\nConference on Computational Geometry, 2006.\n\n[25] K. Quanrud. Approximating optimal transport with linear programs. In Symposium on Simplicity\n\nof Algorithms, volume 69, pages 6:1\u20136:9, 2019.\n\n[26] L. Ramshaw and R. E. Tarjan. A weight-scaling algorithm for min-cost imperfect matchings in\n\nbipartite graphs. In IEEE Foundations of Computer Science, pages 581\u2013590, 2012.\n\n[27] R. Sandler and M. Lindenbaum. Nonnegative matrix factorization with earth mover\u2019s distance\nmetric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33\n(8):1590\u20131602, 2011.\n\n[28] R. Sharathkumar and P. K. Agarwal. A near-linear time approximation algorithm for geometric\n\nbipartite matching. In ACM Symposium on Theory of Computing, pages 385\u2013394, 2012.\n\n[29] R. Sharathkumar and P. K. Agarwal. Algorithms for the transportation problem in geometric\n\nsettings. In ACM/SIAM Sympos. on Discrete Algorithms, pages 306\u2013317, 2012.\n\n[30] J. Sherman. Generalized preconditioning and undirected minimum-cost \ufb02ow. In ACM/SIAM\n\nSympos. on Discrete Algorithms, pages 772\u2013780, 2017.\n\n11\n\n\f", "award": [], "sourceid": 7732, "authors": [{"given_name": "Nathaniel", "family_name": "Lahn", "institution": "Virginia Tech"}, {"given_name": "Deepika", "family_name": "Mulchandani", "institution": "Walmart Labs"}, {"given_name": "Sharath", "family_name": "Raghvendra", "institution": "Virginia Tech"}]}