{"title": "Faster width-dependent algorithm for mixed packing and covering LPs", "book": "Advances in Neural Information Processing Systems", "page_first": 15279, "page_last": 15288, "abstract": "In this paper, we give a faster width-dependent algorithm for mixed packing-covering LPs. Mixed packing-covering LPs are fundamental to combinatorial optimization in computer science and operations research. Our algorithm finds a $1+\\eps$ approximate solution in time $O(Nw/ \\varepsilon)$, where $N$ is number of nonzero entries in the constraint matrix, and $w$ is the maximum number of nonzeros in any constraint. This algorithm is faster than Nesterov's smoothing algorithm which requires $O(N\\sqrt{n}w/ \\eps)$ time, where $n$ is the dimension of the problem. Our work utilizes the framework of area convexity introduced in [Sherman-FOCS'17] to obtain the best dependence on $\\varepsilon$ while breaking the infamous $\\ell_{\\infty}$ barrier to eliminate the factor of $\\sqrt{n}$. The current best width-independent algorithm for this problem runs in time $O(N/\\eps^2)$ [Young-arXiv-14] and hence has worse running time dependence on $\\varepsilon$. Many real life instances of mixed packing-covering problems exhibit small width and for such cases, our algorithm can report higher precision results when compared to width-independent algorithms. As a special case of our result, we report a $1+\\varepsilon$ approximation algorithm for the densest subgraph problem which runs in time $O(md/ \\varepsilon)$, where $m$ is the number of edges in the graph and $d$ is the maximum graph degree.", "full_text": "Faster Width-dependent Algorithm for Mixed\n\nPacking and Covering LPs\n\nDigvijay Boob\nGeorgia Tech\nAtlanta, GA\n\ndigvijaybb40@gatech.edu\n\nSaurabh Sawlani\n\nGeorgia Tech\nAtlanta, GA\n\nsawlani@gatech.edu\n\nDi Wang\u02da\nGoogle AI\nAtlanta, GA\n\nwadi@google.com\n\nAbstract\n\n?\n\nIn this paper, we give a faster width-dependent algorithm for mixed packing-\ncovering LPs. Mixed packing-covering LPs are fundamental to combinatorial\noptimization in computer science and operations research. Our algorithm \ufb01nds\na 1 ` \u03b5 approximate solution in time OpN w{\u03b5q, where N is number of nonzero\nentries in the constraint matrix, and w is the maximum number of nonzeros in any\nconstraint. This algorithm is faster than Nesterov\u2019s smoothing algorithm which\nrequires OpN\nnw{\u03b5q time, where n is the dimension of the problem. Our work\nutilizes the framework of area convexity introduced in [Sherman-FOCS\u201917] to ob-\n?\ntain the best dependence on \u03b5 while breaking the infamous (cid:96)8 barrier to eliminate\nthe factor of\nn. The current best width-independent algorithm for this problem\nruns in time OpN{\u03b52q [Young-arXiv-14] and hence has worse running time depen-\ndence on \u03b5. Many real life instances of mixed packing-covering problems exhibit\nsmall width and for such cases, our algorithm can report higher precision results\nwhen compared to width-independent algorithms. As a special case of our result,\nwe report a 1` \u03b5 approximation algorithm for the densest subgraph problem which\nruns in time Opmd{\u03b5q, where m is the number of edges in the graph and d is the\nmaximum graph degree.\n\n1\n\nIntroduction\n\nMixed packing and covering linear programs (LPs) are a natural class of LPs where coef\ufb01cients,\nvariables, and constraints are non-negative. They model a wide range of important problems in\ncombinatorial optimization and operations research. In general, they model any problem which\ncontains a limited set of available resources (packing constraints) and a set of demands to ful\ufb01ll\n(covering constraints).\nTwo special cases of the problem have been widely studied in literature: pure packing, formulated as\nmaxxtbT x | P x \u010f pu; and pure covering, formulated as minxtbT x | Cx \u011b cu where P, p, C, c, b\nare all non-negative. These are known to model fundamental problems such as maximum bipartite\ngraph matching, minimum set cover, etc. [LN93]. Algorithms to solve packing and covering LPs have\nalso been applied to great effect in designing \ufb02ow control systems [BBR04], scheduling problems\n[PST95], zero-sum matrix games [Nes05] and in mechanism design [ZN01]. In this paper, we study\nthe mixed packing and covering (MPC) problem, formulated as checking the feasibility of the set:\ntx | P x \u010f p, Cx \u011b cu, where P, C, p, c are non-negative. We say that x is an \u03b5-approximate solution\nto MPC if it belongs to the relaxed set tx | P x \u010f p1` \u03b5qp, Cx \u011b p1\u00b4 \u03b5qcu. MPC is a generalization\nof pure packing and pure covering, hence it is applicable to a wider range of problems such as\nmulti-commodity \ufb02ow on graphs [You01, She17], non-negative linear systems and X-ray tomography\n[You01].\n\n\u02daWork done when author was at Georgia Tech.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fGeneral LP solving techniques such as the interior point method can approximate solutions to MPC in\nas few as Oplogp1{\u03b5qq iterations - however, they incur a large per-iteration cost. In contrast, iterative\napproximation algorithms based on \ufb01rst-order optimization methods require polyp1{\u03b5q iterations, but\nthe iterations are fast and in most cases are conducive to ef\ufb01cient parallelization. This property is of\nutmost importance in the context of ever-growing datasets and the availability of powerful parallel\ncomputers, resulting in much faster algorithms in relatively low-precision regimes.\n\n1.1 Previous work\n\nIn literature, algorithms for the MPC problem can be grouped into two broad categories: width-\ndependent and width-independent. Here, width is an intrinsic property of a linear program which\ntypically depends on the dimensions and the largest entry of the constraint matrix, and is an indication\nof the range of values any constraint can take. In the context of this paper and the MPC problem, we\nde\ufb01ne wP and wC as the maximum number of non-zeros in any constraint in P and C respectively.\nWe de\ufb01ne the width of the LP as w def\u201c maxpwP , wCq.\nOne of the \ufb01rst approaches used to solve LPs was Langrangian-relaxation: replacing hard constraints\nwith loss functions which enforce the same constraints indirectly. Using this approach, Plotkin,\nSchmoys and Tardos [PST95], and Grigoriadis and Khachiyan [GK96] obtained width-dependent\npolynomial-time approximation algorithms for MPC. Luby and Nisan [LN93] gave the \ufb01rst width-\n\ndependent parallelizable algorithm for pure packing and pure covering, which ran in rOp\u03b5\u00b44q parallel\ntime, and rOpN \u03b5\u00b44q total work. Here, parallel time (sometimes termed as depth) refers to the longest\nin rOp\u03b5\u00b44q parallel time, and rOpmd\u03b5\u00b42q total work2. Young [You14] later improved his algorithm\nparallel run-time of rOp\u03b5\u00b43q.\n\nchain of dependent operations, and work refers to the total number of operations in the algorithm.\nYoung [You01] extended this technique to give the \ufb01rst width-independent parallel algorithm for MPC\nto run using total work OpN \u03b5\u00b42q. Mahoney et al. [MRWZ16] later gave an algorithm with a faster\n\nThe other most prominent approach in literature towards solving an LP is by converting it into a\nsmooth function [Nes05], and then applying general \ufb01rst-order optimization techniques [Nes05,\nNes12]. Although the dependence on \u03b5 from using \ufb01rst-order techniques is much improved, it\nusually comes at the cost of sub-optimal dependence on the input size and width. For the MPC\nproblem, Nesterov\u2019s accelerated method [Nes12], as well as Bienstock and Iyengar\u2019s adaptation\n[BI06] of Nesterov\u2019s smoothing [Nes05], give rise to algorithms with runtime linearly depending\non \u03b5\u00b41, but with far from optimal dependence on input size and width. For pure packing and pure\ncovering problems, however, Allen-Zhu and Orrechia [AO19] were the \ufb01rst to incorporate Nesterov-\nlike acceleration while still being able to obtain near-linear width-independent runtimes, giving a\n\nrOpN \u03b5\u00b41q time algorithm for the packing problem. For the covering problem, they gave a rOpN \u03b5\u00b41.5q\ntime algorithm, which was then improved to rOpN \u03b5\u00b41q by [WRM16]. Importantly, however, the\n\nabove algorithms do not generalize to MPC.\n\n1.2 Our contributions\n\nWe give the best parallel width-dependent algorithm for MPC, while only incurring a linear depen-\ndence on \u03b5\u00b41 in the parallel runtime and total work. Additionally, the total work has near-linear\ndependence on the input-size. Formally, we state our main theorem as follows.\nTheorem 1.1. There exists a parallel \u03b5-approximation algorithm for the mixed packing covering\n\nproblem, which runs in rOpw \u00a8 \u03b5\u00b41q parallel time, while performing rOpw \u00a8 N \u00a8 \u03b5\u00b41q total work, where\n\nN is the total number of non-zeros in the constraint matrices, and w is the width of the given LP.\n\nTable 1 compares the running time of our algorithm to previous works solving this problem.\nSacri\ufb01cing width independence for faster convergence with respect to precision proves to be a\nvaluable trade-off for several combinatorial optimization problems which naturally have a low width.\nProminent examples of such problems which are not pure packing or covering problems include\nmulticommodity \ufb02ow and densest subgraph, where the width is bounded by the degree of a vertex.\nIn a large number of real-world graphs, the maximum vertex degree is usually small, hence our\n\n2d here is the maximum number of constraints that any variable appears in.\n\n2\n\n\fTable 1: Comparison of runtimes of \u03b5-approximation algorithms for the mixed packing covering\nproblem.\n\nTotal Work\n\nrOpmd\u03b5\u00b42q\nrOpn2.5w1.5\nrOpw \u00a8 N\nrOpN \u03b5\u00b42q\nrOpN \u03b5\u00b43q\nrOpwN \u03b5\u00b41q\n\nP w\u03b5\u00b41q\n?\nn\u03b5\u00b41q\n\nComments\n\nd is column-width\nwidth-dependent\nwidth-dependent\n\nwidth-dependent\n\nYoung [You01]\n\nBienstock and Iyengar [BI06]\n\nNesterov [Nes12]\nYoung [You14]\n\nMahoney et al. [MRWZ16]\n\nThis paper\n\nParallel Runtime\n\nrOp\u03b5\u00b44q\nrOpw\nrOp\u03b5\u00b44q\nrOp\u03b5\u00b43q\nrOpw\u03b5\u00b41q\n\n?\n\nn\u03b5\u00b41q\n\nalgorithm proves to be much faster when we want high-precision solutions. We explicitly show that\nthis result directly gives the fastest algorithm for the densest subgraph problem on low-degree graphs\nin Appendix C.\n\n2 Notation and De\ufb01nitions\nFor any integer q, we represent using (cid:107)\u00a8(cid:107)q the q-norm of any vector. We represent the in\ufb01nity-norm as\n(cid:107)\u00a8(cid:107)8. We denote the in\ufb01nity-norm ball (sometimes called the (cid:96)8 ball) as the set Bn8prq def\u201c tx P Rn :\n\u0159\n(cid:107)x(cid:107)8 \u010f ru. The nonnegative part of this ball is denoted as Bn`,8prq \u201c tx P Rn : x \u011b 0n,(cid:107)x(cid:107)8 \u010f\nru. For radius r \u201c 1, we drop the radius speci\ufb01cation and use the short notation Bn8 and Bn`,8. We\ni\u201c1 xi \u010f 1u. For any y \u011b 0k,\ndenote the extended simplex of dimension k as \u2206`\npyq \u201c y{(cid:107)y(cid:107)1 if (cid:107)y(cid:107)1 \u011b 1. Further, for any set K, we represent its interior, relative interior\nproj\u2206`\nand closure as intpKq, relintpKq and clpKq, respectively. The function exp is applied to a vector\nelement wise. The division of two vectors of same dimension is also performed element wise.\nFor any matrix A, we use nnzpAq to denote the number of nonzero entries in it. We use Ai,: and A:,j\nto refer to the ith row and jth column of A respectively. We use notation Aij (or Ai,j alternatively)\nto denote an element in the i-th row and j-th column of matrix A. (cid:107)A(cid:107)8 denotes the operator norm\n(cid:107)A(cid:107)8\u00d18 def\u201c supx\u20300\n(cid:107)Ax(cid:107)8\n(cid:107)x(cid:107)8 . For a symmetric matrix A and an antisymmetric matrix B, we de\ufb01ne\nan operator \u013ei as A \u013ei B \u00f4\n\nis positive semi-de\ufb01nite.\n\ndef\u201c tx P Rk :\n\n\uf6be\n\n\u201e\n\nk\n\nk\n\nk\n\nA \u00b4B\nB A\n\nWe formally de\ufb01ne an \u03b5-approximate solution to the mixed packing-covering (MPC) problem as\nfollows.\nDe\ufb01nition 2.1. We say that x is an \u03b5-approximate solution of the mixed packing-covering problem\nif x satis\ufb01es x P Bn`,8, P x \u010f p1 ` \u03b5q1p and Cx \u011b p1 \u00b4 \u03b5q1c.\nHere, 1k denotes a vectors of 1\u2019s of dimension k for any integer k.\nThe saddle point problem on two sets x P X and y P Y can be de\ufb01ned as follows:\n\nmin\nxPX\n\nmax\nyPY\n\n(1)\nwhere Lpx, yq is some bilinear form between x and y. For this problem, we de\ufb01ne the primal-dual\n\ngap function as suppsx,syqPX\u02c6Y Lpx,syq \u00b4 Lpsx, yq. This gap function can be used as measure of\nDe\ufb01nition 2.2. We say that px, yq P X\u02c6Y is an \u03b5-optimal solution for (1) if suppsx,syqPX\u02c6Y Lpx,syq\u00b4\nLpsx, yq \u010f \u03b5.\n\naccuracy of the above saddle point solution.\n\nLpx, yq,\n\n3 Technical overview\n\nThe mixed packing-covering (MPC) problem is formally de\ufb01ned as follows.\nGiven two nonnegative matrices P P Rp\u02c6n, C P Rc\u02c6n, \ufb01nd an x P Rn, x \u011b 0,(cid:107)x(cid:107)8 \u010f 1 such that\nP x \u010f 1p and Cx \u011b 1c if it exists, otherwise report infeasibility.\n\n3\n\n\fNote that the vector of 1\u2019s on the right hand side of the packing and covering constraints can be\nobtained by simply scaling each constraint appropriately. We also assume that each entry in the\nmatrices P and C is at most 1. This assumption, and subsequently the (cid:96)8 constraints on x also cause\nno loss of generality3.\nWe reformulate MPC as a saddle point problem, as de\ufb01ned in Section 2;\n\nyP\u2206`\n\nmax\nc , zP\u2206`\n\np\n\nLpx, y, zq,\n\n(2)\n\n\u201e\n\n\u03bb\u02da def\u201c min\n\uf6be\u201e\nxPBn`,8\n\nP \u00b41p\n\u00b4C\n1c\n\n\uf6be\n\np\n\nx\n1\n\nc ,zP\u2206`\n\n. The relation between the two formulations is shown\n\nwhere Lpx, y, zq :\u201c ryT zTs\nin Section 4. For the rest of the paper, we focus on the saddle point formulation (2).\n\u03b7pxq def\u201c maxyP\u2206`\nLpx, y, zq is a piecewise linear convex function. Assuming oracle access\nto this \u201cinner\" maximization problem, the \u201couter\" problem of minimizing \u03b7pxq can be performed\nusing \ufb01rst order methods like mirror descent, which are suitable when the underlying problem space\nis the unit (cid:96)8 ball. One drawback of this class of methods is that their rate of convergence, which is\nstandard for non-accelerated \ufb01rst order methods on non-differentiable objectives, is Op 1\n\u03b52q to obtain\nan \u03b5-approximate minimizer x of \u03b7 which satis\ufb01es \u03b7pxq \u010f \u03b7\u02da ` \u03b5, where \u03b7\u02da is the optimal value.\nThis means that the algorithm needs to access the inner maximization oracle Op 1\n\u03b52q times, which can\nbecome prohibitively large in the high precision regime.\nNote that even though \u03b7 is a piecewise linear non-differentiable function, it is not a black box function,\nbut a maximization linear functions in x. This structure can be exploited using Nesterov\u2019s smoothing\ntechnique [Nes05]. In particular, \u03b7pxq can be approximated by choosing a strongly convex3 function\n\u03c6 : \u2206`\n\nc \u00d1 R and considering\n\np \u02c6 \u2206`\n\nyP\u2206`\n\nc ,zP\u2206`\n\nLpx, y, zq \u00b4 \u03c6py, zq.\n\nThis strongly convex regularization yields thatr\u03b7 is a Lipschitz-smooth4 convex function. If L is the\nconstant of Lipschitz smoothness ofr\u03b7 then application of any of the accelerated gradient methods\nconstruct a smooth \u03b5-approximationr\u03b7 of \u03b7, the Lipschitz smoothness constant L can be chosen to\n\nr\u03b7pxq \u201c max\na\nin literature will converge in Op\nL{\u03b5q iterations. Moreover, it can also be shown that in order to\nbe of the order Op1{\u03b5q, which in turn implies an overall convergence rate of Op1{\u03b5q. In particular,\nNesterov\u2019s smoothing achieves an oracle complexity of Opp(cid:107)P(cid:107)8 ` (cid:107)C(cid:107)8qDx maxtDy, Dzu\u03b5\u00b41q,\nwhere where Dx, Dy and Dz denote the sizes of the ranges of their respective regularizers which\nare strongly convex functions. Dy and Dz can be made of the order of log p and log c, respectively.\nHowever, Dx can be problematic since x belongs to an (cid:96)8 ball. More on this will soon follow.\nNesterov\u2019s dual extrapolation algorithm[Nes07] gives a very similar complexity but is a different\nalgorithm in that it directly addresses the saddle point formulation (2) rather than viewing the problem\nas optimizing a non-smooth function \u03b7. The \ufb01nal convergence for the dual extrapolation algorithm\nis given in terms of the primal-dual gap function of the saddle point problem (2). This algorithms\nviews the saddle point problem as solving variational inequality for an appropriate monotone operator\nin joint domain px, y, zq. Moreover, as opposed to smoothing techniques which only regularize the\ndual, this algorithm regularizes both primal and dual parts (joint regularization), hence is a different\nscheme altogether.\nNote that for both schemes mentioned above, the maximization oracle itself has an analytical\nexpression which involves matrix-vector multiplication. Hence each call to the oracle incurs a\nsequential run-time of nnzpPq ` nnzpCq. Then, overall complexity for both schemes is of order\nOppnnzpPq ` nnzpCqqp(cid:107)P(cid:107)8 ` (cid:107)C(cid:107)8qDx maxtDy, Dzu\u03b5\u00b41q.\n\np\n\n3This transformation can be achieved by adapting techniques from [WRM16] while increasing dimension of\nthe problem up to a logarithmic factor. Details of this fact are in Appendix B in the full version of this paper. For\nthe purpose of the main text, we work with this assumption.\n4De\ufb01nitions of Lipschitz-smoothness and strong convexity can be found in many texts in nonlinear program-\nming and machine learning. e.g. [Bub14]. Intuitively, f is Lipschitz-smooth if the rate of change of \u2207f can be\nbounded by a quantity known as the \u201cconstant of Lipschitz smoothness\u201d.\n\n4\n\n\fThe (cid:96)8 barrier\nNote that the \ufb01rst method, i.e., Nesterov\u2019s smoothing technique has known lower bounds due to\n[GN15] (see Corollary 1 in their paper). According to their result, the framework of Nesterov\u2019s\nsmoothing has a known limitation since it only regularizes the dual variables. As opposed to this,\nNesterov\u2019s dual extrapolation regularizes both primal and dual variables, and has potential to skip the\nearlier mentioned lower bounds of [GN15]. However, the complexity result of this method involves a\nDx term, which denotes the range of a convex function over the domain of x. The following lemma\nstates a lower bound for this range in case of (cid:96)8 balls.\nLemma 3.1. Any strongly convex function has a range of at least \u2126p?\nSince Dx \u201c \u2126p?\n?\n\nnq for each member function of this wide class, there is no hope of eliminating this\n\nnq on any (cid:96)8 ball.\n\nn factor using techniques involving explicit use of strong convexity.\n\nSo, the goal now is to \ufb01nd a joint regularization function with a small range over (cid:96)8 balls, but still\nact as good enough regularizers to enable accelerated convergence of the descent algorithm. In\npursuit of breaking this (cid:96)8 barrier, we draw inspiration from the notion of area convexity introduced\nby Sherman [She17]. Area convexity is a weaker notion than strong convexity, however, it is still\nstrong enough to ensure that accelerated \ufb01rst order methods still go through when using area convex\nregularizers. Since this is a weaker notion than strong convexity, we can construct area convex\nfunctions which have range of Opnop1qq on (cid:96)8 ball.\nFirst, we de\ufb01ne area convexity, and then go on to mention its relevance to the saddle point problem\n(2).\nArea convexity is a notion de\ufb01ned in context of a matrix A P Ra\u02c6b and a convex set K \u010e Ra`b. Let\nMA\n\n\u201e\n\n\uf6be\n\ndef\u201c\n\n.\n\n0b\u02c6b \u00b4AT\n0a\u02c6a\nA\n\nDe\ufb01nition 3.2 ([She17]). A function \u03c6 is area convex with respect to a matrix A on a convex set K iff\npv \u00b4 uqT MApu\u00b4 tq.\nfor any t, u, v P K, \u03c6 satis\ufb01es \u03c6\n\n\u03c6ptq` \u03c6puq` \u03c6pvq\n\n` t ` u ` v\n\n\u02d8\n\n3\n\n`\n\n\u010f 1\n3\n\n\u02d8\n\n?\n\u00b4 1\n3\n\n3\n\n`\n\n\u201e\n\n\uf6be\n0 \u00b41\n0\n1\n\n2p\u03c6ptq ` \u03c6puqq exceeds \u03c6p 1\n\nTo understand the de\ufb01nition above, let us \ufb01rst look at the notion of strong convexity. \u03c6 is said to\n2pt ` uqq by an amount\nbe strongly convex if for any two points t, u, 1\nproportional to (cid:107)t \u00b4 u(cid:107)2\n2. De\ufb01nition 3.2 generalizes this notion in context of matrix A for any three\npoints x, y, z. \u03c6 is area-convex on set K if for any three points t, u, v P K, we have 1\n3p\u03c6ptq` \u03c6puq`\n3pt ` u ` vqq by an amount proportional to the area of the triangle de\ufb01ned by the\n\u03c6pvqq exceeds \u03c6p 1\nconvex hull of t, u, v.\nConsider the case that points t, u, v are collinear. For this case, the area term (i.e., the term involving\nMA) in De\ufb01nition 3.2 is 0 since matrix MA is antisymmetric. In this sense, area convexity is even\nweaker than strict convexity. Moreover, the notion of area is parameterized by matrix A. To see\na speci\ufb01c example of this notion of area, consider A \u201c\nand t, u, v P R2. Then, for all\npossible permutations of t, u, v, the area term takes a value equal to \u02d8pt1pu2 \u00b4 v2q ` u1pv2 \u00b4\nt2q ` v1pt2 \u00b4 u2qq. Since the condition holds irrespective of the permutation so we must have that\n|t1pu2 \u00b4 v2q ` u1pv2 \u00b4 t2q ` v1pt2 \u00b4 u2q|. But note\n\u03c6p t`u`v\n2|t1pu2 \u00b4 v2q ` u1pv2 \u00b4 t2q ` v1pt2 \u00b4 u2q|.\nthat area of triangle formed by points t, u, v is equal to 1\nHence the area term is just a high dimensional matrix based generalization of the area of a triangle.\nComing back to the saddle point problem (2), we need to pick a suitable area convex function \u03c6\non the set Bn`,8 \u02c6 \u2206`\nc . Since \u03c6 is de\ufb01ned on the joint space, it has the property of joint\nregularization vis a vis (2). However, we need an additional parameter: a suitable matrix MA. The\nchoice of this matrix is related to the bilinear form of the primal-dual gap function of (2). We delve\ninto the technical details of this in Section 4, however, we state that the matrix is composed of P, C\nand some additional constants. The algorithm we state exactly follows Nesterov\u2019s dual extrapolation\nmethod described earlier. One notable difference is that in [Nes07], they consider joint regularization\nby a strongly convex function which does not depend on the problem matrices P, C but only on the\nconstraint set Bn`,8 \u02c6 \u2206`\nc . Our area convex regularizer, on the other hand, is tailor made for\nthe particular problem matrices P, C as well as the constraint set.\n\n\u02d8\n\u03c6ptq ` \u03c6puq ` \u03c6pvq\n\np \u02c6 \u2206`\n\np \u02c6 \u2206`\n\n\u00b4 1\n?\n3\n\nq \u010f 1\n\n3\n\n3\n\n3\n\n5\n\n\f4 Area Convexity for Mixed Packing Covering LPs\n\nIn this section, we present our technical results and algorithm for the MPC problem, with the end goal\nof proving Theorem 1.1. First, we relate an p1 ` \u03b5q-approximate solution to the saddle point problem\nto an \u03b5-approximate solution to MPC. Next, we present some theoretical background towards the\ngoal of choosing and analyzing an appropriate area-convex regularizer in the context of the saddle\npoint formulation, where the key requirement of the area convex function is to obtain a provable and\nef\ufb01cient convergence result. Finally, we explicitly show an area convex function which is generated\nusing a simple \u201cgadget\" function. We show that this area convex function satis\ufb01es all key requirements\nand hence achieves the desired accelerated rate of convergence. This section closely follows [She17],\nin which the author chooses an area convex function speci\ufb01c to the undirected multicommodity \ufb02ow\nproblem. Due to space constraints, we relegate almost all proofs to Appendix A (in the full version)\nand simply include pointers to proofs in [She17] when it is directly applicable.\n\n4.1 Saddle Point Formulation for MPC\n\npair px, y, zq and psx,sy,szq for (2), we denote w \u201c px, u, y, zq and sw \u201c psx,su,sy,szq where u,su P R.\n\nConsider the saddle point formulation in (2) for MPC. Given a feasible primal-dual feasible solution\nThen, we de\ufb01ne a function Q : Rn`1`p`c \u02c6 Rn`1`p`c \u00d1 R as\n\u00b4 ryT zTs\n\n\uf6be\u201e\n\n\u201e\n\n\uf6be\n\n.\n\nP \u00b41p\n\u00b4C\n1c\n\nx\nu\n\n\u201e\n\n\uf6be\n\uf6be\u201esxsu\nLpx,sy,szq \u00b4 Lpsx, y, zq\n\nP \u00b41p\n\u00b4C\n1c\n\nQpw,swq def\u201c rsyT szTs\nQpw,swq \u201c\n\nNote that if u \u201csu \u201c 1, then\nsupswPW\n\nsxPBn`,8,syP\u2206`\n\nsup\n\np ,szP\u2206`\n\nc\n\nis precisely the primal-dual gap function de\ufb01ned in Section 2. Notice that if px\u02da, y\u02da, z\u02daq is a saddle\npoint of (2), then we have\n\nLpx\u02da, y, zq \u010f Lpx\u02da, y\u02da, z\u02daq \u010f Lpx, y\u02da, z\u02daq\n\np , z P \u2206`\nfor all x P Bn`,8, y P \u2206`\nwhere W def\u201c Bn`,8 \u02c6 t1u \u02c6 \u2206`\nThis motivates the following accuracy measure of the candidate approximate solution w.\nDe\ufb01nition 4.1. We say that w P W is an \u03b5-optimal solution of (2) iff\n\nc . From above equation, it is clear that Qpw, w\u02daq \u011b 0 for all w P W\np \u02c6 \u2206`\nc and w\u02da \u201c px\u02da, 1, y\u02da, z\u02daq P W. Moreover, Qpw\u02da, w\u02daq \u201c 0.\n\nsupswPW\n\nRemark 4.2. Recall the de\ufb01nition of MA for a matrix A in Section 3. We can rewrite Qpw,swq \u201c\nswT Jw where J \u201c MH and\n\u201e\n\n\uf6be\n\nQpw,swq \u010f \u03b5.\n\u00bb\u2014\u20130n\u02c6n 0n\u02c61 \u00b4P T\n\n01\u02c6n\nP\n\u00b4C\n\n0\n\u00b41p\n1c\n\nC T\n\u00b41T\n1T\np\nc\n0p\u02c6p 0p\u02c6c\n0c\u02c6c\n0c\u02c6p\n\n\ufb01\ufb03\ufb02 .\n\nThus, the gap function in De\ufb01nition 4.1 can be written in the bilinear form supswPW swT Jw.\nLemma 4.3. Let px, y, zq satisfy suppsx,sy,szqPBn`,8\u02c6\u2206`\n2. y, z satisfy yTpPsx \u00b4 1pq ` zTp\u00b4Csx ` 1cq \u0105 0 for allsx P Bn`,8.\n\nLemma 4.3 relates the \u03b5-optimal solution of (2) to the \u03b5-approximate solution to MPC.\n\nLpx,sy,szq \u00b4 Lpsx, y, zq \u010f \u03b5. Then either\n\n1. x is an \u03b5-approximate solution of MPC, or\n\np \u02c6\u2206`\n\nc\n\nThis lemma states that in order to \ufb01nd an \u03b5-approximate solution of MPC, it suf\ufb01ces to \ufb01nd \u03b5-optimal\nsolution of (2). Henceforth, we will focus on \u03b5-optimality of the saddle point formulation (2).\n\n6\n\nH \u201c\n\nP \u00b41p\n\u00b4C\n1c\n\n\u00f1 J :\u201c\n\n\f4.2 Area Convexity with Saddle Point Framework\n\nHere we state some useful lemmas which help in determining whether a differentiable function is\narea convex. We start with the following remark which follows from the de\ufb01nition of area convexity\n(De\ufb01nition 3.2).\n\nRemark 4.4. If \u03c6 is area convex with respect to A on a convex set K, and sK \u010e K is a convex set,\nthen \u03c6 is area convex with respect to A on sK.\n\n\u201e\n\n\uf6be\n0 \u00b41\n0\n1\n\nThe following two lemmas from [She17] provide the key characterization of area convexity.\nLemma 4.5. Let A P R2\u02c62 symmetric matrix. A \u013ei\n\u00f4 A \u013e 0 and detpAq \u011b 1.\nLemma 4.6. Let \u03c6 be twice differentiable on the interior of convex set K, i.e., intpKq.\n\n1. If \u03c6 is area convex with respect to A on intpKq, then d2\u03c6pxq \u013ei MA for all x P intpKq.\n2. If d2\u03c6pxq \u013ei MA for all x P intpKq, then \u03c6 is area convex with respect to 1\nMoreover, if \u03c6 is continuous on clpKq, then \u03c6 is area convex with respect to 1\n\n3 A on intpKq.\n3 A on clpKq.\nIn order to handle the operator \u013ei (recall from Section 2), we state some basic but important properties\nof this operator, which will come in handy in later proofs.\nRemark 4.7. For symmetric matrices A and C and antisymmetric matrices B and D,\n\n1. If A \u013ei B then A \u013eip\u00b4Bq.\n2. If A \u013ei B and \u03bb \u011b 0 then \u03bbA \u013ei \u03bbB.\n3. If A \u013ei B and C \u013ei D then A ` C \u013eipB ` Dq.\n\nHaving laid a basic foundation for area convexity, we now focus on its relevance to solving the saddle\npoint problem (2). Considering Remark 4.2, we can write the gap function criterion of optimality\nin terms of bilinear form of the matrix J. Suppose we have a function \u03c6 which is area convex with\nrespect to H on set W. Then, consider the following jointly-regularized version of the bilinear form:\n(3)\nSimilar to Nesterov\u2019s dual extrapolation, one can attain Op1{\u03b5q convergence of accelerated gradient\n\ndescent for functionr\u03b7pwq in (3) over variable w. In order to obtain gradients ofr\u03b7pwq, we need access\nto argmaxswPW swT Jw \u00b4 \u03c6pswq. However, it may not be possible to \ufb01nd an exact maximizer in all\n\nr\u03b7pwq :\u201c supswPW\n\nswT Jw \u00b4 \u03c6pswq.\n\ncases. Again, one can get around this dif\ufb01culty by instead using an approximate optimization oracle\nof the problem in (3).\nDe\ufb01nition 4.8. A \u03b4-optimal solution oracle (OSO) for \u03c6 : W \u00d1 R takes input a and outputs w P W\nsuch that\n\naTsw \u00b4 \u03c6pswq \u00b4 \u03b4.\n\naT w \u00b4 \u03c6pwq \u011b supswPW\n\nGiven \u03a6 as a \u03b4-OSO for a function \u03c6, consider the following algorithm (Algorithm 4.2):\n\nAlgorithm 1 Area Convex Mixed Packing Covering (AC-MPC)\n\nInitialize w0 \u201c p0n, 1, 0p`cq\nfor t \u201c 0, . . . , T do\nend for\n\nwt`1 \u00d0 wt ` \u03a6pJwt ` 2J\u03a6pJwtqq\n\nFor Algorithm 4.2, [She17] shows the following:\n?\nLemma 4.9. Let \u03c6 : W \u00d1 r\u00b4\u03c1, 0s. Suppose \u03c6 is area convex with respect to 2\nfor J \u201c MH and for all t \u011b 1 we have wt{t P W and,\n\n3H on W. Then\n\nIn particular, in \u03c1\n\n\u03b5 iterations, Algorithm 4.2 obtain p\u03b4 ` \u03b5q-solution of the saddle point problem (2).\n\nswJ wt\nt \u010f \u03b4 ` \u03c1\nt .\n\nsupswPW\n\n7\n\n\fThe analysis of this lemma closely follows the analysis of Nesterov\u2019s dual extrapolation.\nNote that, each iteration consists of Op1q matrix-vector multiplications, Op1q vector additions, and\nOp1q calls to the approximate oracle. Since the former two are parallelizable to Oplog nq depth,\nthe same remains to be shown for the oracle computation to complete the proof of the run-time in\nTheorem 1.1.\nRecall from the discussion in Section 3 that the critical bottleneck of Nesterov\u2019s method is that\nnq, which is achieved even in the Euclidean (cid:96)2 norm. This makes \u03c1 in\nnq, which can be a major bottleneck for high dimensional LPs, which are\n\ndiameter of the (cid:96)8 ball is \u2126p?\nLemma 4.9 to also be \u2126p?\n\ncommonplace among real-world applications.\nAlthough, on the face of it, area convexity applied to the saddle point formulation (2) has a similar\nframework to Nesterov\u2019s dual extrapolation, the challenge is to construct a \u03c6 for which we can\novercome the above bottleneck. Particularly, there are three key challenges to tackle:\n1. We need to show that existence of a function \u03c6 that is area convex with respect to H on W.\n2. \u03c6 : W \u00d1 r\u00b4\u03c1, 0s should be such that \u03c1 is not too large.\n3. There should exist an ef\ufb01cient \u03b4-OSO for \u03c6.\nIn the next subsection, we focus on these three aspects in order to complete our analysis.\n\n4.3 Choosing an area convex function\n\nFirst, we consider a simple 2-D gadget function and prove a \u201cnice\" property of this gadget. Using\nthis gadget, we construct a function which can be shown to be area convex using the aforementioned\nproperty of the gadget.\nLet \u03b3\u03b2 : R2` \u00d1 R be a function parameterized by \u03b2 de\ufb01ned as\n\u03b3\u03b2pa, bq \u201c ba log a ` \u03b2b log b.\n\uf6be\n\n\u201e\n\nLemma 4.10. Suppose \u03b2 \u011b 2. Then d2\u03b3\u03b2pa, bq \u013e\n\nfor all a P p0, 1s and b \u0105 0.\n\n0 \u00b41\n0\n1\n\nTheorem 4.11. Let w \u201c px, u, y, zq and de\ufb01ne\n\nNow, using the function \u03b3\u03b2, we construct a function \u03c6 and use the suf\ufb01ciency criterion provided in\nLemma 4.6 to show that \u03c6 is area convex with respect to J on W. Note that our set of interest W is\nnot full-dimensional, whereas Lemma (4.6) is only stated for int and not for relint. To get around\n\nthis dif\ufb01culty, we consider a larger set\u010eW \u0104 W such that\u010eW is full dimensional and \u03c6 is area convex\non\u010eW. Then we use Remark 4.4 to obtain the \ufb01nal result, i.e., area convexity of \u03c6.\nCij\u03b3cipxj, ziq ` c\u0159\n\u03c6pwq def\u201c p\u0159\nset\u010eW :\u201c Bn`1`,8p1q \u02c6 \u2206`\n\u201e\n\nPij\u03b3pipxj, yiq ` p\u0159\nwhere pi \u201c 2 \u02da (cid:107)P(cid:107)8\n\uf6be\n(cid:107)Pi,:(cid:107)1\n\n\u201e\n\u03b32pu, ziq,\nP \u00b41p\n\u00b4C\n1c\n\nand ci \u201c 2 \u02da (cid:107)C(cid:107)8\n(cid:107)Ci,:(cid:107)1\np \u02c6 \u2206`\n\n?\nc . In particular, it also implies 6\n\n\u03b32pu, yiq ` c\u0159\n\n, then \u03c6 is area convex with respect to 1\n3\n\n3\u03c6 is area convex with respect to\n\ni\u201c1\n\nj\u201c1\n\n\uf6be\n\non\n\ni\u201c1\n\ni\u201c1\n\nj\u201c1\n\nn\u0159\n\nn\u0159\n\ni\u201c1\n\n?\n2\n\n3\n\nP \u00b41p\n\u00b4C\n1c\n\non set W.\n\nTheorem 4.11 addresses the \ufb01rst part of the key three challenges. Next, Lemma 4.12 shows an upper\nbound on the range of \u03c6.\nLemma 4.12. Function \u03c6 : W \u00d1 r\u00b4\u03c1, 0s then \u03c1 \u201c Op(cid:107)P(cid:107)8 log p ` (cid:107)C(cid:107)8 log cq.\n\nFinally, we need an ef\ufb01cient \u03b4-OSO. Consider the following alternating minimization algorithm.\n\n8\n\n\fAlgorithm 2 \u03b4-OSO for \u03c6\n\nend for\n\nInput a P Rn`1, a1 P Rp, a2 P Rc, \u03b4 \u0105 0\nInitialize px0, u0q P Bn`,8 \u02c6 t1u arbitrarily.\nfor k \u201c 1, . . . , K do\npyk, zkq \u00d0 argmax\nyP\u2206`\nc , zP\u2206`\npxk, ukq \u00d0 argmax\n\np\n\nyT a1 ` zT a2 \u00b4 \u03c6pxk\u00b41, uk\u00b41, y, zq\nrxT usa \u00b4 \u03c6px, u, yk, zkq\n\npx,uqPBn`,8\u02c6t1u\n\n\u03b4q iterations.\n\n[Bec15] shows the following convergence result.\nLemma 4.13. For \u03b4 \u0105 0, Algorithm 2 is a \u03b4-OSO for \u03c6 which converges in Oplog 1\nWe show that for our chosen \u03c6, we can perform the two argmax computations in each iteration of\nAlgorithm 2 analytically in time OpnnzpPq ` nnzpCqq, and hence we obtain a \u03b4-OSO which takes\nOppnnzpPq ` nnzpCq log 1\n\u03b4q total work. Parallelizing matrix-vector multiplications eliminates the\n(cid:32)\ndependence on nnzpPq and nnzpCq, at the cost of another logpNq term.\nLemma 4.14. Each argmax in Algorithm 2 can be computed as follows:\nxk \u201c min\nyk \u201c proj\u2206`\nzk \u201c proj\u2206`\nIn particular, we can compute xk, yk, zk in OpnnzpPq ` nnzpCqq work and Oplog Nq parallel time.\n\n(\u02d8\n(\u02d8\nfor all j P rns.\n2p(cid:107)P(cid:107)8`1qpa1 \u00b4 P xk\u00b41 log xk\u00b41q\n2p(cid:107)C(cid:107)8`1qpa2 \u00b4 Cxk\u00b41 log xk\u00b41q\n\nexp\nexp\n\n\u02d8\n(cid:32)\n(cid:32)\nP T yk`CT zk \u00b4 1\n\na\n\n1\n\n`\n`\n\n(\n\n`\n\nexp\n\np\n\nc\n\n, 1n\n\n1\n\nAcknowledgements\n\nWe thank Richard Peng for many important pointers and discussions.\n\nReferences\n\n[AO19] Zeyuan Allen-Zhu and Lorenzo Orecchia. Nearly linear-time packing and covering\nLP solvers - achieving width-independence and -convergence. Math. Program., 175(1-\n2):307\u2013353, 2019.\n\n[BBR04] Yair Bartal, John W. Byers, and Danny Raz. Fast, distributed approximation algorithms\nfor positive linear programming with applications to \ufb02ow control. SIAM J. Comput.,\n33(6):1261\u20131279, 2004.\n\n[Bec15] Amir Beck. On the convergence of alternating minimization for convex programming\nwith applications to iteratively reweighted least squares and decomposition schemes.\nSIAM Journal on Optimization, 25(1):185\u2013209, 2015.\n\n[BGM14] Bahman Bahmani, Ashish Goel, and Kamesh Munagala. Ef\ufb01cient primal-dual graph\nalgorithms for mapreduce. In Algorithms and Models for the Web Graph - 11th Inter-\nnational Workshop, WAW 2014, Beijing, China, December 17-18, 2014, Proceedings,\npages 59\u201378, 2014.\n\n[BI06] Daniel Bienstock and Garud Iyengar. Approximating fractional packings and coverings\n\nin o(1/epsilon) iterations. SIAM J. Comput., 35(4):825\u2013854, 2006.\n\n[Bub14] S\u00e9bastien Bubeck. Theory of convex optimization for machine learning. arXiv preprint\n\narXiv:1405.4980, 15, 2014.\n\n[Cha00] Moses Charikar. Greedy approximation algorithms for \ufb01nding dense components in a\ngraph. In Proceedings of the Third International Workshop on Approximation Algorithms\nfor Combinatorial Optimization, APPROX \u201900, pages 84\u201395, Berlin, Heidelberg, 2000.\n[GK96] Michael D. Grigoriadis and Leonid G. Khachiyan. Approximate minimum-cost multi-\n\ncommodity \ufb02ows in \u00f5(epsilon-2knm) time. Math. Program., 75:477\u2013482, 1996.\n\n[GN15] Crist\u00f3bal Guzm\u00e1n and Arkadi Nemirovski. On lower complexity bounds for large-scale\n\nsmooth convex optimization. Journal of Complexity, 31(1):1 \u2013 14, 2015.\n\n9\n\n\f[LN93] Michael Luby and Noam Nisan. A parallel approximation algorithm for positive linear\nprogramming. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory\nof Computing, May 16-18, 1993, San Diego, CA, USA, pages 448\u2013457, 1993.\n\n[MRWZ16] Michael W. Mahoney, Satish Rao, Di Wang, and Peng Zhang. Approximating the\nsolution to mixed packing and covering lps in parallel o(epsilon\u02c6{-3}) time. In 43rd\nInternational Colloquium on Automata, Languages, and Programming, ICALP 2016,\nJuly 11-15, 2016, Rome, Italy, pages 52:1\u201352:14, 2016.\n\n[Nes05] Yurii Nesterov. Smooth minimization of non-smooth functions. Math. Program.,\n\n103(1):127\u2013152, 2005.\n\n[Nes07] Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities\n\nand related problems. Math. Program., 109(2-3):319\u2013344, 2007.\n\n[Nes12] Yurii Nesterov. Ef\ufb01ciency of coordinate descent methods on huge-scale optimization\n\nproblems. SIAM Journal on Optimization, 22(2):341\u2013362, 2012.\n\nfractional packing and covering problems. Math. Oper. Res., 20(2):257\u2013301, 1995.\n\n[PST95] Serge A. Plotkin, David B. Shmoys, and \u00c9va Tardos. Fast approximation algorithms for\n[She17] Jonah Sherman. Area-convexity, l8 regularization, and undirected multicommodity\nIn Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of\n\ufb02ow.\nComputing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 452\u2013460,\n2017.\n\n[WRM16] Di Wang, Satish Rao, and Michael W. Mahoney. Uni\ufb01ed acceleration method for packing\nand covering problems via diameter reduction. In 43rd International Colloquium on\nAutomata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy,\npages 50:1\u201350:13, 2016.\n\n[You01] Neal E. Young. Sequential and parallel algorithms for mixed packing and covering.\nIn 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17\nOctober 2001, Las Vegas, Nevada, USA, pages 538\u2013546, 2001.\n\n[You14] Neal E. Young. Nearly linear-time approximation schemes for mixed packing/covering\n\nand facility-location linear programs. CoRR, abs/1407.3015, 2014.\n\n[ZN01] Edo Zurel and Noam Nisan. An ef\ufb01cient approximate allocation algorithm for com-\nbinatorial auctions. In Proceedings 3rd ACM Conference on Electronic Commerce\n(EC-2001), Tampa, Florida, USA, October 14-17, 2001, pages 125\u2013136, 2001.\n\n10\n\n\f", "award": [], "sourceid": 8763, "authors": [{"given_name": "Digvijay", "family_name": "Boob", "institution": "Georgia Institute of Technology"}, {"given_name": "Saurabh", "family_name": "Sawlani", "institution": "Georgia Institute of Technology"}, {"given_name": "Di", "family_name": "Wang", "institution": "Google AI"}]}