{"title": "Optimal Analysis of Subset-Selection Based L_p Low-Rank Approximation", "book": "Advances in Neural Information Processing Systems", "page_first": 2541, "page_last": 2552, "abstract": "We show that for the problem of $\\ell_p$ rank-$k$ approximation of any given matrix over $R^{n\\times m}$ and $C^{n\\times m}$, the algorithm of column subset selection enjoys approximation ratio $(k+1)^{1/p}$ for $1\\le p\\le 2$ and $(k+1)^{1-1/p}$ for $p\\ge 2$. This improves upon the previous $O(k+1)$ bound (Chierichetti et al.,2017) for $p\\ge 1$. We complement our analysis with lower bounds; these bounds match our upper bounds up to constant 1 when $p\\geq 2$. At the core of our techniques is an application of Riesz-Thorin interpolation theorem from harmonic analysis, which might be of independent interest to other algorithmic designs and analysis more broadly.\n\nOur analysis results in improvements on approximation guarantees of several other algorithms with various time complexity. For example, to make the algorithm of column subset selection computationally efficient, we analyze a polynomial time bi-criteria algorithm which selects $O(k\\log m)$ number of columns. We show that this algorithm has an approximation ratio of $O((k+1)^{1/p})$ for $1\\le p\\le 2$ and $O((k+1)^{1-1/p})$ for $p\\ge 2$. This improves over the bound in (Chierichetti et al.,2017) with an $O(k+1)$ approximation ratio. Our bi-criteria algorithm also implies an exact-rank method in polynomial time with a slightly larger approximation ratio.", "full_text": "Optimal Analysis of Subset-Selection Based (cid:96)p\n\nLow-Rank Approximation\n\nChen Dan\n\nCarnegie Mellon University\n\ncdan@cs.cmu.edu\n\nHong Wang\u2217\n\nPrinceton University\n\nHong.Wang1991@gmail.com\n\nHongyang Zhang\u2217\n\nToyota Technological Institute at Chicago\n\nhonyanz@ttic.edu\n\nYuchen Zhou\u2217\n\nUniversity of Wisconsin, Madison\nyuchenzhou@stat.wisc.edu\n\nPradeep Ravikumar\n\nCarnegie Mellon University\npradeepr@cs.cmu.edu\n\nAbstract\n\nWe study the low rank approximation problem of any given matrix A over Rn\u00d7m\nand Cn\u00d7m in entry-wise (cid:96)p loss, that is, \ufb01nding a rank-k matrix X such that\n(cid:107)A \u2212 X(cid:107)p is minimized. Unlike the traditional (cid:96)2 setting, this particular variant is\nNP-Hard. We show that the algorithm of column subset selection, which was an\nalgorithmic foundation of many existing algorithms, enjoys approximation ratio\n(k + 1)1/p for 1 \u2264 p \u2264 2 and (k + 1)1\u22121/p for p \u2265 2. This improves upon the\nprevious O(k + 1) bound for p \u2265 1 [1]. We complement our analysis with lower\nbounds; these bounds match our upper bounds up to constant 1 when p \u2265 2. At the\ncore of our techniques is an application of Riesz-Thorin interpolation theorem from\nharmonic analysis, which might be of independent interest to other algorithmic\ndesigns and analysis more broadly.\nAs a consequence of our analysis, we provide better approximation guarantees\nfor several other algorithms with various time complexity. For example, to make\nthe algorithm of column subset selection computationally ef\ufb01cient, we analyze a\npolynomial time bi-criteria algorithm which selects O(k log m) columns. We show\nthat this algorithm has an approximation ratio of O((k + 1)1/p) for 1 \u2264 p \u2264 2 and\nO((k + 1)1\u22121/p) for p \u2265 2. This improves over the best-known bound with an\nO(k + 1) approximation ratio. Our bi-criteria algorithm also implies an exact-rank\nmethod in polynomial time with a slightly larger approximation ratio.\n\n1\n\nIntroduction\n\nLow rank approximation has wide applications in compressed sensing, numerical linear algebra,\nmachine learning, and many other domains. In compressed sensing, low rank approximation serves\nas an indispensable building block for data compression. In numerical linear algebra and machine\nlearning, low rank approximation is the foundation of many data processing algorithms, such as\nPCA. Given a data matrix A \u2208 Fn\u00d7m, low rank approximation aims at \ufb01nding a low-rank matrix\n\n\u2217Equal Contribution\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fX \u2208 Fn\u00d7m such that\n\nOPT =\n\nmin\n\nX:rank(X)\u2264k\n\n(cid:107)X \u2212 A(cid:107).\n\n(1)\n\nHere the \ufb01eld F can be either R or C. The focus of this work is on the case when (cid:107)\u00b7(cid:107) is the entry-wise\n\n(cid:96)p norm, and we are interested in an estimate (cid:98)X with a tight approximation ratio \u03b1 so that we have\nthe guarantee: (cid:107)(cid:98)X \u2212 A(cid:107) \u2264 \u03b1 \u00b7 OPT.\n\nAs noted earlier, such low-rank approximation is a fundamental workhorse of machine learning. The\nkey reason to focus on approximations with respect to general (cid:96)p norms, in contrast to the typical\n(cid:96)2 norm, is that these general (cid:96)p norms are better able to capture a broader range of realistic noise\nin complex datasets. For example, it is well-known that the (cid:96)1 norm is more robust to the sparse\noutlier [2\u20134]. So the (cid:96)1 low-rank approximation problem is a robust version of the classic PCA\nwhich uses the (cid:96)2 norm and has received tremendous attentions in machine learning, computer vision\nand data mining [5], [6], [7]. A related problem (cid:96)p linear regression has also been studied extensively\nin the statistics community, and these two problems share similar motivation. In particular, if we\nassume a statistical model Aij = A(cid:63)\nij + \u03b5ij, where A(cid:63) is a low rank matrix and \u03b5ij are i.i.d. noise, the\ndifferent values of p correspond to the MLE of different noise distributions, say p = 1 for Laplacian\nnoise and p = 2 for Gaussian noise.\nWhile it has better empirical and statistical properties, the key bottleneck to solving the problem in (1)\nis computational, and is known to be NP-hard in general. For example, the (cid:96)1 low-rank approximation\nis NP-hard to solve exactly even when k = 1 [8], and is even hard to approximate with large error\nunder the Exponential Time Hypothesis [9]. [10] proved the NP-hardness of the problem when\np = \u221e. A recent work [11] proves that the problem has no constant factor approximation algorithm\nrunning in time O(2k\u03b4\n) for a constant \u03b4 > 0, assuming the correctness of Small Set Expansion\nHypothesis and Exponential Time Hypothesis. The authors also proposed a PTAS (Polynomial Time\nApproximation Scheme) with (1 + \u03b5) approximation ratio when 0 < p < 2. However, the running\ntime is as large as O(npoly(k/\u03b5)).\nMany other efforts have been devoted to designing approximation algorithms in order to alleviate the\ncomputational issues of (cid:96)p low-rank approximation. One promising approach is to apply subgradient\ndescent based methods or alternating minimization [12]. Unfortunately, the loss surface of problem\n(1) suffers from saddle points even in the simplest p = 2 case [13], which might be arbitrarily worse\nthan OPT. Therefore, they may not work well for the low-rank approximation problem as these local\nsearching algorithms may easily get stuck at bad stationary points without any guarantee.\nInstead, we consider another line of research\u2014the heuristic algorithm of column subset selection\n(CSS). Here, the algorithm proceeds by choosing the best k columns of A as an estimation of column\nspace of X and then solving an (cid:96)p linear regression problem in order to obtain the optimal row space\nof X. See Algorithm 1 for the detailed procedure. Although the vanilla form of the subset selection\nbased algorithm also has an exponential time complexity in terms of the rank k, it can be slightly\nmodi\ufb01ed to polynomial time bi-criteria algorithms which selects more than k columns [1]. Most\nimportantly, these algorithms are easy to implement and runs fast with nice empirical performance.\nThus, subset selection based algorithms might seem to effectively alleviate the computational issues\nof problem (1). The caveat however is that CSS might seem like a simple heuristic, with potentially a\nvery large worst-case approximation ratio \u03b1.\nIn this paper, we show that CSS yields surprisingly reasonable approximation ratios, which we\nalso show to be tight by providing corresponding lower bounds, thus providing a strong theoretical\nbacking for the empirical observations underlying CSS.\nDue in part to its importance, there has been a burgeoning set of recent analyses of column subset\nselection. In the traditional low rank approximation problem with Frobenious norm error (the p = 2\ncase in our setting), [14] showed that CSS achieves\nk + 1 approximation ratio. The authors\nalso showed that the\nk + 1 bound is tight (both upper and lower bounds can be recovered by\nour analysis). [15\u201318] improved the running time of CSS with different sampling schemes while\npreserving similar approximation bounds. The CSS algorithm and its variants are also applied and\nanalyzed under various different settings. For instance, [19] and [20] studied the CUR decomposition\nwith the Frobenius norm. [21] studied the CSS problem under the missing-data case. With (cid:96)1\nerror, [22] studied CSS for non-negative matrices in (cid:96)1 error. [23] gave tight approximation bounds\n\n\u221a\n\n\u221a\n\n2\n\n\ffor CSS under \ufb01nite-\ufb01eld binary matrix setting. Furthermore, [9] considered the low rank tensor\napproximation with the Frobenius norm.\nDespite a large amount of work on the subset-selection algorithm and the (cid:96)p low rank approximation\nproblem, many fundamental questions remain unresolved. Probably one of the most important open\nquestions is: what is the tight approximation ratio \u03b1 for the subset-selection algorithm in the (cid:96)p low\nrank approximation problem, up to a constant factor? In [1], the approximation ratio is shown to be\nupper bounded by (k + 1) and lower bounded by O(k1\u2212 2\np ) when p > 2. This problem becomes even\nmore challenging when one requires the approximation ratio to be tight up to factor 1, as little was\nknown about a direct tool to achieve this goal in general. In this work, we improve both upper and\nlower bounds in [1] to optimal when p > 2. Note that our bounds are still applicable and improve\nover [1] when 1 < p < 2, but there is an O(k\n\np\u22121) gap between the upper and lower bounds.\n\n2\n\n1.1 Our Results\n\nThe best-known approximation ratio of subset selection based algorithms for (cid:96)p low-rank approxima-\ntion is O(k + 1) [1]. In this work, we give an improved analysis of this algorithm. In particular, we\nshow that the Column Subset Selection in Algorithm 1 is a cp,k-approximation, where\n\n(cid:40)\n\ncp,k =\n\n1\n(k + 1)\np ,\n(k + 1)1\u2212 1\np ,\n\n1 \u2264 p \u2264 2,\np \u2265 2.\n\nThis improves over Theorem 4 in [1] which proved that the algorithm is an O(k + 1)-approximation,\nfor all p \u2265 1. Below, we state our main theorem formally:\nTheorem 1.1 (Upper bound). The subset selection algorithm in Algorithm 1 is a cp,k-approximation.\nOur proof of Theorem 1.1 is built upon novel techniques of Riesz-Thorin interpolation theorem. In\nparticular, with the proof of special cases for p = 1, 2,\u221e, we are able to interpolate the approximation\nratio of all intermediate p\u2019s. Our techniques might be of independent interest to other (cid:96)p norm or\nSchatten-p norm related problem more broadly. See Section 1.2 for more discussions.\nWe also complement our positive result of subset selection algorithm with a negative result. Surpris-\ningly, our upper bound matches our lower bound exactly up to constant 1 for p \u2265 2. Below, we state\nour negative results formally:\nTheorem 1.2 (Lower bound). There exist in\ufb01nitely many different values of k, such that approxima-\ntion ratio of any k-subset-selection based algorithm is at least (k + 1)1\u2212 1\np for (cid:96)p rank-k approxima-\ntion.\nNote that our lower bound strictly improves the (k + 1)1\u2212 2\ncan be found in Section 1.2 and we put the whole proof of Theorem 1.2 in Appendix 3.\nOne drawback of Algorithm 1 is that the running time scales exponentially with the rank k. However,\nit serves as an algorithmic foundation of many existing computationally ef\ufb01cient algorithms. For\nexample, a bi-criteria variant of this algorithm runs in polynomial time, only requiring the rank\nparameter to be a little over-parameterized. Our new analysis can be applied to this algorithm as well.\nBelow, we state our result informally:\nTheorem 1.3 (Informal statement of Theorem 4.1). There is a bi-criteria algorithm which runs in\npoly(m, n, k) time and selects O(k log m) columns of A. The algorithm is an O(cp,k)-approximation\nalgorithm.\n\np bound in [1]. The main idea of the proof\n\nOur next result is a computationally-ef\ufb01cient, exact-rank algorithm with slightly larger approximation\nratio. Below, we state our result informally:\nTheorem 1.4 (Informal statement of Theorem 4.2). There is an algorithm which solves problem\n(1) and runs in poly(m, n) time with an O(c3\np,kk log m)-approximation ratio, provided that k =\nO( log n\n\nlog log n ).\n\n1.2 Our Techniques\n\nIn this section, we give a detailed discussion about our techniques in the proofs. We start with the\nanalysis of approximation ratio of column subset selection algorithm.\n\n3\n\n\fAlgorithm 1 A cp,k approximation to problem (1) by column subset selection.\n1: Input: Data matrix A and rank parameter k.\n2: Output: X \u2208 Rn\u00d7m such that rank(X) \u2264 k and (cid:107)X \u2212 A(cid:107)p \u2264 cp,k \u00b7 OPT.\n3: for I \u2208 {S \u2286 [m]; |S| = k} do\n4:\n5:\n6:\n7: end for\n8: return XI which minimizes (cid:107)A \u2212 XI(cid:107)p for I \u2208 {S \u2286 [m]; |S| = k}.\n\nU \u2190 AI.\nRun (cid:96)p linear regression over V that minimizes the loss (cid:107)A \u2212 U V (cid:107)p.\nLet XI = U V\n\nRemark Throughout this paper, we state the theorems for real matrices. The results can be naturally\ngeneralized for complex matrices as well.\nNotations: We denote by A \u2208 Rn\u00d7m the input matrix, and Ai the i-th column of A. A\u2217 = U V\nis the optimal rank-k approximation, where U \u2208 Rn\u00d7k, V \u2208 Rk\u00d7m. \u2206i = Ai \u2212 U Vi is the error\nvector on the i-th column, and \u2206li the l-th element of vector \u2206i. For any X \u2208 Rn\u00d7k, de\ufb01ne the\nerror of projecting A onto X by Err(X) = minY \u2208Rk\u00d7m (cid:107)A \u2212 XY (cid:107)p. Let J = (j1,\u00b7\u00b7\u00b7 , jk) \u2208 [m]k\nbe a subset of [m] with cardinality k. We denote XJ as the following column subset in matrix X:\nXJ = [Xj1, Xj2 ,\u00b7\u00b7\u00b7 , Xjk ]. Similarly, we denote by XdJ the following column subset in the d-th\ndimension of matrix X: XdJ = [Xdj1 , Xdj2 ,\u00b7\u00b7\u00b7 , Xdjk ]. Denote by J\u2217 the column subset which\ngives smallest approximation error, i.e., J\u2217 = argminJ\u2208[m]k Err(AJ ).\nAnalysis in Previous Work: In order to show that the column subset selection algorithm gives an\n\u03b1-approximation, we need to prove that\n\n(2)\nDirectly bounding Err(AJ\u2217 ) is prohibitive. In [1], the authors proved an upper bound of (k + 1)\nin two steps. First, the authors constructed a speci\ufb01c S \u2208 [m]k, and upper bounded Err(AJ\u2217 ) by\nErr(AS). Their construction is as follows: S is de\ufb01ned as the minimizer of\n\nErr(AJ\u2217 ) \u2264 \u03b1(cid:107)\u2206(cid:107)p.\n\n(cid:81)\n| det(VJ )|\nj\u2208J (cid:107)\u2206j(cid:107)p\n\n.\n\nS = argmin\nJ\u2208[m]k\n\nIn the second step, [1] upper bounded Err(AS) by considering the approximation error on each\ncolumn Ai, and upper bounded the (cid:96)p distance from Ai to the subspace spanned by AS using triangle\ninequality of (cid:96)p distance. They showed that the distance is at most (k + 1) times of (cid:107)\u2206i(cid:107), uniformly\nfor all columns i \u2208 [m]. Therefore, the approximation ratio is bounded by (k + 1).Our approach is\ndifferent from the above analysis in both steps.\n\nWeighted Average: In the \ufb01rst step, we use a so-called weighted average technique, inspired by the\napproach in [14, 23]. Instead of using the error of one speci\ufb01c column subset as an upper bound, we\nuse a weighted average over all possible column subsets, i.e.,\n\nErrp(AJ\u2217 ) \u2264 (cid:88)\n\nJ\u2208[m]k\n\nwJ Errp(AJ ),\n\nwhere the weight wJ\u2019s are carefully chosen for each column subset J. This weighted average\ntechnique captures more information from all possible column subsets, rather than only from one\nspeci\ufb01c subset, and leads to a tighter bound.\n\nRiesz-Thorin Interpolation Theorem: In the second step, unlike [1] which simply used triangle\ninequality to prove the upper bound, our technique leads to more re\ufb01ned analysis of upper bounds for\nthe approximation error for each subset J \u2208 [m]k. With the technique of weighted average in the \ufb01rst\nstep, proving a technical inequality (Lemma 2.2) concerning the determinants suf\ufb01ces to complete the\nanalysis of approximation ratio. In the proof of this lemma, we introduce several powerful tools from\nharmonic analysis, the theory of interpolating linear operators. Riesz-Thorin theorem is a classical\nresult in interpolation theory that gives bounds for Lp to Lq operator norm. In general, it is easier\nto prove estimates within spaces like L2, L1 and L\u221e. Interpolation theory enables us to generalize\n\n4\n\n\fresults in those spaces to some Lp and Lq spaces with an explicit operator norm. By the Riesz-Thorin\ninterpolation theorem, we are able to prove the lemma by just checking the special cases p = 1, 2,\u221e,\nand then interpolate the inequality for all the intermediate value of p\u2019s.\n\n\u221a\n\nLower Bounds: We now discuss the techniques in proving the lower bounds. Our proof is a\ngeneralization of [14], which shows that for the special case p = 2,\nk + 1 is the best possible\napproximation ratio. Their proof for the lower bound is constructive: they constructed a (k + 1) \u00d7\n(k + 1) matrix A, such that using any k-subset leads to a sub-optimal solution by a factor no less\nthan (1 \u2212 \u03b5)\nk + 1. However, since (cid:96)p norm is not rotationally-invariant in general, it is tricky to\ngeneralize their analysis to other values of p\u2019s. To resolve the problem, we use a specialized version\nof their construction, the perturbed Hadamard matrices (see Section 3 for details), as they have nice\nsymmetricity and are much easier to analyze. We give an example of special case k = 3 for better\nintuition:\n\n\u221a\n\n\uf8eb\uf8ec\uf8ed\u03b5\n\nA =\n\n\u03b5\n\u03b5\n\u03b5\n1 \u22121 \u22121\n1\n1 \u22121\n1 \u22121\n1 \u22121 \u22121\n1\n\n\uf8f6\uf8f7\uf8f8 .\n\nHere \u03b5 is a positive constant close to 0. We note that A is very close to a rank-3 matrix: if we replace\nthe \ufb01rst row by four zeros, then it becomes rank-3. Thus, the optimal rank-3 approximation error is at\nmost (4\u03b5p)1/p = 41/p\u03b5. Now we consider the column subset selection algorithm. For example, we\nuse the \ufb01rst three columns A1, A2, A3 to approximate the whole matrix \u2014 the error only comes from\nthe fourth column. We can show that when \u03b5 is small, the projection of A4 to span{A1, A2, A3} is\nvery close to\n\n\u2212A1 \u2212 A2 \u2212 A3 = (\u22123\u03b5,\u22121,\u22121, 1)T .\n\nTherefore, the column subset selection algorithm achieve about 4\u03b5 error on this matrix, which is a\np factor from being optimal. The similar construction works for any integer k = 2r \u2212 1, r \u2208 Z+,\n41\u2212 1\nwhere the lower bound is replaced by (k + 1)1\u2212 1\np , also matches with our upper bound exactly when\np \u2265 2.\n\n2 Analysis of Approximation Ratio\n\nIn this section, we will prove Theorem 1.1. Recall that our goal is to bound Err(AJ\u2217 ). We \ufb01rst\nintroduce two useful lemmas. Lemma 2.1 gives an upper bound on approximation error by choosing\na single arbitrary column subset AJ. Lemma 2.2 is our main technical lemma.\nLemma 2.1. If J satis\ufb01es det(VJ ) (cid:54)= 0, then the approximation error of AJ can be upper bounded\nby\n\nErrp(AJ ) \u2264 (cid:107)\u2206 \u2212 \u2206J V \u22121\n\nJ V (cid:107)p\np.\n\nLemma 2.2. Let S = {sij} \u2208 Ck\u00d7m be a complex matrix, r = (r1,\u00b7\u00b7\u00b7 , rm) be m-dimensional\ncomplex vector, and T =\n\n\u2208 C(k+1)\u00d7m, then we have\n\n|det(TI )|p \u2264 Cp,k\n\n|det(SJ )|p ,\n\n|rl|p (cid:88)\n\nJ\u2208[m]k\n\nm(cid:88)\n(cid:26)(k + 1),\n\nl=1\n\nCp,k = cp\n\np,k =\n\n(k + 1)p\u22121,\n\n1 \u2264 p \u2264 2,\np \u2265 2.\n\n(cid:21)\n\n(cid:20)r\n(cid:88)\n\nS\n\nI\u2208[m]k+1\n\nwhere\n\nWe \ufb01rst show that Theorem 1.1 has a clean proof using the two lemmas, as stated below.\n\nProof. of Theorem 1.1: We can WLOG assume that rank(V ) = k. In fact, if rank(A) < k, then\nof course Errp(AJ\u2217 ) = 0 and there is nothing to prove. Otherwise if rank(A) \u2265 k, then by the\nde\ufb01nition of V , we know that rank(V ) = k.\n\n5\n\n\fIn other words, we are going to choose a set of non-negative weights wJ such that(cid:80)\n\nWe will upper bound the approximation error of the best column by a weighted average of Errp(AJ ).\nJ\u2208[m]k wJ = 1,\n\nand upper bound Errp(AJ\u2217 ) by\n\nErrp(AJ\u2217 ) = min\nJ\u2208[m]k\n\nwJ Errp(AJ ).\n\nIn the following analysis, our choice of wJ will be\n\nErrp(AJ ) \u2264 (cid:88)\n(cid:80)\n| det(VJ )|p\nI\u2208[m]k | det(VI )|p .\n\nJ\u2208[m]k\n\nSince rank(V ) = k, wJ are well-de\ufb01ned. We \ufb01rst prove\n\nwJ =\n\n| det(VJ )|pErrp(AJ\u2217 ) \u2264 n(cid:88)\n\nm(cid:88)\n\nd=1\n\ni=1\n\n(cid:12)(cid:12)(cid:12)(cid:12)det\n\n(cid:18)\u2206dL\n\nVL\n\n(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)p\n\n,\n\n(3)\n\nwhere we denote L = (i, J) = (i, j1,\u00b7\u00b7\u00b7 , jk) \u2208 [m]k+1.\nIn fact, when det(VJ ) = 0, of course LHS of (3) = 0 \u2264 RHS. When det(VJ ) (cid:54)= 0, we know that VJ is\ninvertible. By Lemma 2.1,\n\n| det(VJ )|pErrp(AJ\u2217 ) \u2264| det(VJ )|p(cid:107)\u2206 \u2212 \u2206J V \u22121\n\np\n\np\n\n=\n\ni=1\n\nJ Vi\n\nJ V (cid:107)p\n\n=(cid:107) det(VJ )(cid:0)\u2206 \u2212 \u2206J V \u22121\nJ V(cid:1)(cid:107)p\nm(cid:88)\n(cid:107) det(VJ )(cid:0)\u2206i \u2212 \u2206J V \u22121\n(cid:1)(cid:107)p\nm(cid:88)\nn(cid:88)\n| det(VJ )(cid:0)\u2206di \u2212 \u2206dJ V \u22121\n(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)p\n(cid:12)(cid:12)(cid:12)(cid:12)det\n(cid:18)\u2206di \u2206dJ\nm(cid:88)\nn(cid:88)\n(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)p\n(cid:12)(cid:12)(cid:12)(cid:12)det\n(cid:18)\u2206dL\nm(cid:88)\nn(cid:88)\n\nVL\n\nVJ\n\nd=1\n\nd=1\n\nVi\n\ni=1\n\ni=1\n\n=\n\n=\n\n=\n\np\n\n.\n\nd=1\n\ni=1\n\n(cid:1)|p\n\nJ Vi\n\n1\n\n1\n\ni=1\n\nVL\n\nJ\u2208[m]k\n\n(cid:80)\n| det(VJ )|p\n(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)p\uf8f6\uf8f8\nI\u2208[m]k | det(VI )|p Errp(AJ )\n(cid:12)(cid:12)(cid:12)(cid:12)det\n(cid:18)\u2206dL\nm(cid:88)\n(cid:88)\n(cid:80)\n(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)p\uf8f6\uf8f8 .\nI\u2208[m]k | det(VI )|p\n(cid:12)(cid:12)(cid:12)(cid:12)det\n(cid:18)\u2206dL\n(cid:88)\n(cid:80)\nI\u2208[m]k | det(VI )|p\n(cid:19)(cid:12)(cid:12)(cid:12)(cid:12)p \u2264 Cp,k\n(cid:88)\nm(cid:88)\nm(cid:88)\nm(cid:88)\n\n(cid:12)(cid:12)(cid:12)(cid:12)det\n(cid:18)\u2206dL\n\uf8f6\uf8f8 = Cp,k\n\n|\u2206dj|p.\n\n(cid:107)\u2206j(cid:107)p\n\n|\u2206dj|p\n\nL\u2208[m]k+1\n\nL\u2208[m]k+1\n\nVL\n\nVL\n\nj=1\n\np = Cp,kOPTp,\n\nd=1\n\nj=1\n\nj=1\n\nErr(AJ\u2217 ) \u2264 C 1/p\n\np,k OPT = cp,kOPT.\n\n6\n\nThe second to last equality follows from the Schur\u2019s determinant identity. Therefore (3) holds, and\n\nJ\u2208[m]k\n\nErrp(AJ\u2217 ) \u2264 (cid:88)\n\uf8eb\uf8ed\n\u2264 n(cid:88)\n\uf8eb\uf8ed\nn(cid:88)\n\nd=1\n\n=\n\nd=1\n\n1\n\n(cid:80)\nI\u2208[m]k | det(VI )|p\nErrp(AJ\u2217 ) \u2264 n(cid:88)\n\n\uf8eb\uf8edCp,k\n\nBy Lemma 2.2,\n\nTherefore,\n\nwhich means\n\n\fTherefore, we only need to prove the two lemmas. Lemma 2.1 is relatively easy to prove.\n\nProof. of Lemma 2.1: Recall that by de\ufb01nition of \u2206i, Ai = U Vi + \u2206i,\n\nErrp(AJ ) = min\n\nY \u2208Rk\u00d7m\n\n(cid:107)A \u2212 AJ Y (cid:107)p\nJ V (cid:107)p\n\np\n\np\n\n\u2264 (cid:107)A \u2212 AJ V \u22121\n= (cid:107)(U V + \u2206) \u2212 (U VJ + \u2206J )V \u22121\n= (cid:107)\u2206 \u2212 \u2206J V \u22121\n\nJ V (cid:107)p\np.\n\nJ V (cid:107)p\n\np\n\nThe main dif\ufb01culty in our analysis comes from Lemma 2.2. The proof is based on Riesz-Thorin\ninterpolation theorem from harmonic analysis. Although the technical details in verifying a key\ninequality (4) are quite complicated, the remaining part which connects Lemma 2.2 to the Riesz-\nThorin interpolation theorem is not that dif\ufb01cult to understand. Below we give a proof to Lemma 2.2\nwithout verifying (4), and leave the complete proof of (4) in the appendix.\n\nProof. of Lemma 2.2: We \ufb01rst state a simpli\ufb01ed version of the Riesz-Thorin interpolation theorem,\nwhich is the most convenient-to-use version for our proof. The general version can be found in the\nAppendix.\nLemma 2.3. [Simpli\ufb01ed version of Riesz-Thorin] Let \u039b : Cn1 \u00d7 Cn2 \u2192 Cn0 be a multi-linear\noperator, such that the following inequalities\n\n(cid:107)\u039b(a, b)(cid:107)p0 \u2264 Mp0(cid:107)a(cid:107)p0(cid:107)b(cid:107)p0 .\n(cid:107)\u039b(a, b)(cid:107)p1 \u2264 Mp1(cid:107)a(cid:107)p1(cid:107)b(cid:107)p1 .\n\nhold for all a \u2208 Cn1, b \u2208 Cn2, then we have\n\nholds for all \u03b8 \u2208 [0, 1], where\n\n(cid:107)\u039b(a, b)(cid:107)p\u03b8 \u2264 M 1\u2212\u03b8\n\np0\n\nM \u03b8\np1\n\n(cid:107)a(cid:107)p\u03b8(cid:107)b(cid:107)p\u03b8 .\n\n1\np\u03b8\n\n:=\n\n1 \u2212 \u03b8\np0\n\n+\n\n\u03b8\np1\n\n.\n\nRiesz-Thorin theorem is a classical result in interpolation theory that gives bounds for Lp to Lq oper-\nator norm. In general, it is easier to prove estimates within spaces like L2, L1 and L\u221e. Interpolation\ntheory enables us to generalize results in those spaces to some Lp and Lq spaces in between with an\nexplicit operator norm. In our application, the Xi is a set of ni elements and Vi is Cni, the space of\nfunctions on ni elements.\nNow we prove Lemma 2.2. In fact, by symmetricity, Lemma 2.2 is equivalent to\n\n(cid:88)\n\nm(cid:88)\n\n|rt|p (cid:88)\n\n(k + 1)!\n\n|det(TI )|p \u2264 k!Cp,k\n\n|det(SJ )|p .\n\nHere,(cid:0)[m]\n\nk\nTaking 1\n\nt=1\n\nJ\u2208([m]\nk )\n\nI\u2208( [m]\nk+1)\n\np-th power on both sides, we have the following equivalent form\n\n(cid:1) = {(i1,\u00b7\u00b7\u00b7 , ik)|1 \u2264 i1 < i2 < \u00b7\u00b7\u00b7 < ik \u2264 m} denotes the k-subsets of [m].\n\uf8eb\uf8ec\uf8ed (cid:88)\n\uf8f6\uf8f7\uf8f8 1\n\n\uf8eb\uf8ec\uf8ed (cid:88)\n\n(cid:32) m(cid:88)\n\n\uf8f6\uf8f7\uf8f8 1\n\n|det(SJ )|p\n\n|det(TI )|p\n\n(cid:33) 1\n\n|rt|p\n\np\n\np\n\np\n\n.\n\n\u2264 cp,k\n(k + 1)\n\n1\np\n\nI\u2208( [m]\nk+1)\n\nBy Laplace expansion on the \ufb01rst row of det(TI ), we have for every I = (i1,\u00b7\u00b7\u00b7 , ik+1) \u2208(cid:0) [m]\n\nJ\u2208([m]\nk )\n\nt=1\n\n(cid:1)\n\nk+1\n\nk+1(cid:88)\n(\u22121)t+1rit det(SI\u2212t).\n\ndet(TI ) =\n\nt=1\n\n7\n\n\fHere, I\u2212t = (i1,\u00b7\u00b7\u00b7 , it\u22121, it+1,\u00b7\u00b7\u00b7 , ik+1) \u2208(cid:0)[m]\n(cid:1).\nThis motivates us to de\ufb01ne the following multilinear map \u039b : C([m]\n{at}t\u2208([m]\nis de\ufb01ned as\n\n1 ),{bJ}J\u2208([m]\n\nk ) \u2208 C([m]\n\n1 ) \u2208 C([m]\n\nk ) \u2192 C( [m]\n\nk\n\n1 ) \u00d7 C([m]\n\nk ), and index set I = (i1,\u00b7\u00b7\u00b7 , ik+1) \u2208(cid:0) [m]\nk+1(cid:88)\n(\u22121)t+1aitbI\u2212t.\n\nk+1\n\nk+1): for all\n\n(cid:1), [\u039b(a, b)]I\n\n[\u039b(a, b)]I =\n\nt=1\n\nNow, by letting at = rt, bJ = det(SJ ), the inequality can be written as\n\n(cid:16)\n\n(cid:17)(cid:107)a(cid:107)p(cid:107)b(cid:107)p.\n\n(4)\n\n(cid:107)\u039b(a, b)(cid:107)p \u2264 cp,k\n(k + 1)\n\n(cid:107)a(cid:107)p(cid:107)b(cid:107)p = max\n\n1\np\n\n1, (k + 1)1\u2212 2\n\np\n\n(cid:16)\n\n(cid:17)\n\n1, (k + 1)1\u2212 2\n\np\n\n, the inequality can be rewritten as (cid:107)\u039b(a, b)(cid:107)p \u2264 Mp(cid:107)a(cid:107)p(cid:107)b(cid:107)p. We\n\nLet Mp = max\ndenote\n\n1\np\n\n=\n\n1 \u2212 \u03b8\np0\n\n+\n\n\u03b8\np1\n\n.\n\n(cid:26)\u03b5\n\nhere, when p \u2208 [1, 2), we choose p0 = 1, p1 = 2; when p \u2208 [2, +\u221e), we choose p0 = 2, p1 = +\u221e.\nThen, we can observe the following nice property about Mp:\n\nMp = M 1\u2212\u03b8\n\n(5)\nThis is exactly the same form as Riesz-Thorin Theorem! Hence, we only need to show (4) holds\nfor p = 1, 2,\u221e, then applying Riesz-Thorin proves all the intermediate cases p \u2208 (1, 2) \u222a (2,\u221e)\nimmediately.\nWe leave the complete proof of (4) in the appendix.\n\nM \u03b8\np1\n\np0\n\n3 Lower Bounds\n\nIn this section, we give a proof sketch of Theorem 1.2. The proof is constructive: we prove the\ntheorem by showing for all \u03b5 > 0, we can construct a matrix A(\u03b5), such that selecting every k\ncolumns of A(\u03b5) leads to an approximation ratio at least (k+1)\n1+o\u03b5(1) . Then, the theorem follows by\nletting \u03b5 \u2192 0+. Our choice of A(\u03b5) is a perturbation of Hadamard matrices.\nThroughout the proof, we assume that k = 2r \u2212 1, for some r \u2208 Z+, and \u03b5 > 0 is an arbitrarily small\nconstant. We consider the well known Hadamard matrix of order (k + 1) = 2r, de\ufb01ned below:\n\n1\u2212 1\np\n\n(cid:32)\n\nH (1) = 1,\n\n(cid:33)\n\nH (2l) =\n\nH (2l\u22121) H (2l\u22121)\nH (2l\u22121) \u2212H (2l\u22121)\n\n, l \u2265 1.\n\nNow we can de\ufb01ne A(\u03b5), the construction of lower bound instance: it is a perturbation of H by\nreplacing all the entries on the \ufb01rst row by \u03b5, i.e.,\n\nA(\u03b5)ij =\n\nwhen i = 1,\nHij when i (cid:54)= 1.\n\n(6)\n\nWe can see that A(\u03b5) is close to a rank-k matrix. In fact, A(0) has rank at most k. Therefore, we can\nupper bound OPT by\n\nOPT \u2264 (cid:107)A(\u03b5) \u2212 A(0)(cid:107)p = ((k + 1)\u03b5p)1/p = (k + 1)1/p\u03b5.\n\n(7)\nThe remaining work is to give a lower bound on the approximation error using any k columns.\nFor simplicity of notations, we use A as shorthand for A(\u03b5) when it\u2019s clear from context. Say we\n\n8\n\n\freturn all the columns of A\n\nAlgorithm 2 [1] SELECTCOLUMNS (k, A): Selecting O(k log m) columns of A.\n1: Input: Data matrix A and rank parameter k.\n2: Output: O(k log m) columns of A\n3: if number of columns of A \u2264 2k then\n4:\n5: else\n6:\n7:\n8:\n9:\n10:\n11: end if\n\nuntil at least (1/10)-fraction columns of A are \u03bbp-approximately covered\nLet AR be the columns of A not approximately covered by R\nreturn AR\u222a SELECTCOLUMNS (k, AR)\n\nLet R be uniform at random 2k columns of A\n\nrepeat\n\nAlgorithm 3 [1]An algorithm that transforms an O(k log m)-rank matrix factorization into a k-rank\nmatrix factorization without in\ufb02ating the error too much.\n1: Input: U \u2208 Rn\u00d7O(k log m), V \u2208 RO(k log m)\u00d7m\n2: Output: W \u2208 Rn\u00d7k, Z \u2208 Rk\u00d7m\n3: Apply Lemma E.1 to U to obtain matrix W 0\n4: Run (cid:96)p linear regression over Z 0, s.t. (cid:107)W 0Z 0 \u2212 U V (cid:107)p is minimized\n5: Apply Algorithm 1 with input (Z 0)T \u2208 Rn\u00d7O(k log m) and k to obtain X and Y\n6: Set Z \u2190 X T\n7: Set W \u2190 W 0Y T\n8: Output W and Z\n\nare using all (k + 1) columns except the j-th, i.e. the column subset is A[k+1]\u2212{j}. Obviously, we\nachieve zero error on all the columns other than the j-th. Therefore, the approximation error is\n\nessentially the (cid:96)p distance from Aj to span(cid:8)A[k+1]\u2212{j}(cid:9). We can show that the (cid:96)p projection from\nAj to span(cid:8)A[k+1]\u2212{j}(cid:9) is very close to(cid:80)\n\ni(cid:54)=j(\u2212Ai), in other words,\n\n(\u2212Ai)(cid:107)p = (1 \u2212 o(1))(k + 1)\u03b5\n\n(8)\n\nErr(A[k+1]\u2212{j}) = (1 \u2212 o(1))(cid:107)Aj \u2212(cid:88)\n\ni(cid:54)=j\n\nThe theorem follows by combining (7) and (8). The complete proof can be found in the appendix.\n\n4 Analysis of Ef\ufb01cient Algorithms\n\nOne drawback of the column subset selection algorithm is its time complexity - it requires\nO(mk poly(n)) time, which is not desirable since it\u2019s exponential in k. However, several more\nef\ufb01cient algorithms [1] are designed based on it. Our tighter analysis on Algorithm 1 implies better\napproximation guarantees on these algorithms as well. The improved bounds can be stated as follows:\n\nTheorem 4.1. Algorithm 2, which runs in poly(m, n, k) time and selects k log m columns, is a\nbi-criteria O(cp,k) = O((k + 1)max(1/p,1\u22121/p)) approximation algorithm.\nTheorem 4.2. Algorithm 3, which runs in poly(m, n) time as long as k = O( log n\nO(c3\n\np,kk log m) = O(kmax(1+3/p,4\u22123/p) log m) approximation algorithm.\n\nlog log n ) is an\n\nThese results improve the previous O(k) and O(k4 log(m)) bounds respectively. We include the\nanalysis of Algorithm 2 and Algorithm 3 in Appendix for completeness.\n\nAcknowledgments\n\nC.D. and P.R. acknowledge the support of Rakuten Inc., and NSF via IIS1909816. The authors would\nalso like to acknowledge two MathOver\ufb02ow users, known to us only by their usernames, \u2019fedja\u2019 and\n\u2019Mahdi\u2019, for informing us the Riesz-Thorin interpolation theorem.\n\n9\n\n\fReferences\n[1] Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, and\nDavid P Woodruff. Algorithms for lp low rank approximation. arXiv preprint arXiv:1705.06730,\n2017.\n\n[2] Emmanuel J Cand\u00e8s, Xiaodong Li, Yi Ma, and John Wright. Robust principal component\n\nanalysis? Journal of the ACM (JACM), 58(3):11, 2011.\n\n[3] Peter J Huber. Robust statistics. In International Encyclopedia of Statistical Science, pages\n\n1248\u20131251. Springer, 2011.\n\n[4] Lei Xu and Alan L Yuille. Robust principal component analysis by self-organizing rules based\non statistical physics approach. IEEE Transactions on Neural Networks, 6(1):131\u2013143, 1995.\n\n[5] Deyu Meng and Fernando De La Torre. Robust matrix factorization with unknown noise. In\nProceedings of the IEEE International Conference on Computer Vision, pages 1337\u20131344,\n2013.\n\n[6] Naiyan Wang and Dit-Yan Yeung. Bayesian robust matrix factorization for image and video\nprocessing. In Proceedings of the IEEE International Conference on Computer Vision, pages\n1785\u20131792, 2013.\n\n[7] Liang Xiong, Xi Chen, and Jeff Schneider. Direct robust matrix factorizatoin for anomaly\nIn Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages\n\ndetection.\n844\u2013853. IEEE, 2011.\n\n[8] Nicolas Gillis and Stephen A Vavasis. On the complexity of robust pca and l1-norm low-rank\n\nmatrix approximation. Mathematics of Operations Research, 2018.\n\n[9] Zhao Song, David P Woodruff, and Peilin Zhong. Low rank approximation with entrywise\nl 1-norm error. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of\nComputing, pages 688\u2013701. ACM, 2017.\n\n[10] Nicolas Gillis and Yaroslav Shitov. Low-rank matrix approximation in the in\ufb01nity norm. arXiv\n\npreprint arXiv:1706.00078, 2017.\n\n[11] Frank Ban, Vijay Bhattiprolu, Karl Bringmann, Pavel Kolev, Euiwoong Lee, and David P\nWoodruff. A ptas for lp-low rank approximation. In Proceedings of the Thirtieth Annual\nACM-SIAM Symposium on Discrete Algorithms, pages 747\u2013766. SIAM, 2019.\n\n[12] Anastasios Kyrillidis. Simple and practical algorithms for lp norm low-rank approximation.\n\narXiv preprint arXiv:1805.09464, 2018.\n\n[13] Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning\n\nfrom examples without local minima. Neural networks, 2(1):53\u201358, 1989.\n\n[14] Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. Matrix approximation\nIn Proceedings of the seventeenth annual\nand projective clustering via volume sampling.\nACM-SIAM symposium on Discrete algorithm, pages 1117\u20131126. Society for Industrial and\nApplied Mathematics, 2006.\n\n[15] Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast monte-carlo algorithms for \ufb01nding\n\nlow-rank approximations. Journal of the ACM (JACM), 51(6):1025\u20131041, 2004.\n\n[16] Amit Deshpande and Santosh Vempala. Adaptive sampling and fast low-rank matrix approxi-\nmation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and\nTechniques, pages 292\u2013303. Springer, 2006.\n\n[17] Christos Boutsidis, Michael W Mahoney, and Petros Drineas. An improved approximation\nalgorithm for the column subset selection problem. In Proceedings of the twentieth annual\nACM-SIAM symposium on Discrete algorithms, pages 968\u2013977. SIAM, 2009.\n\n10\n\n\f[18] Amit Deshpande and Luis Rademacher. Ef\ufb01cient volume sampling for row/column subset\nselection. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on,\npages 329\u2013338. IEEE, 2010.\n\n[19] Petros Drineas, Michael W Mahoney, and S Muthukrishnan. Relative-error cur matrix decom-\n\npositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844\u2013881, 2008.\n\n[20] Christos Boutsidis and David P Woodruff. Optimal cur matrix decompositions. SIAM Journal\n\non Computing, 46(2):543\u2013589, 2017.\n\n[21] Yining Wang and Aarti Singh. Column subset selection with missing data via active sampling.\n\nIn Arti\ufb01cial Intelligence and Statistics, pages 1033\u20131041, 2015.\n\n[22] Aditya Bhaskara and Silvio Lattanzi. Non-negative sparse regression and column subset\nselection with l1 error. In LIPIcs-Leibniz International Proceedings in Informatics, volume 94.\nSchloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.\n\n[23] Chen Dan, Kristoffer Arnsfelt Hansen, He Jiang, Liwei Wang, and Yuchen Zhou. Low Rank Ap-\nproximation of Binary Matrices: Column Subset Selection and Generalizations. In Igor Potapov,\nPaul Spirakis, and James Worrell, editors, 43rd International Symposium on Mathematical Foun-\ndations of Computer Science (MFCS 2018), volume 117 of Leibniz International Proceedings in\nInformatics (LIPIcs), pages 41:1\u201341:16, Dagstuhl, Germany, 2018. Schloss Dagstuhl\u2013Leibniz-\nZentrum fuer Informatik. ISBN 978-3-95977-086-6. doi: 10.4230/LIPIcs.MFCS.2018.41. URL\nhttp://drops.dagstuhl.de/opus/volltexte/2018/9623.\n\n[24] Qifa Ke and Takeo Kanade. Robust l/sub 1/norm factorization in the presence of outliers and\nmissing data by alternative convex programming. In Computer Vision and Pattern Recognition,\n2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 739\u2013746. IEEE,\n2005.\n\n[25] Feiping Nie, Jianjun Yuan, and Heng Huang. Optimal mean robust principal component analysis.\n\nIn International conference on machine learning, pages 1062\u20131070, 2014.\n\n[26] Praneeth Netrapalli, UN Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek\nJain. Non-convex robust pca. In Advances in Neural Information Processing Systems, pages\n1107\u20131115, 2014.\n\n[27] Kai-Yang Chiang, Cho-Jui Hsieh, and Inderjit Dhillon. Robust principal component analysis\nwith side information. In International Conference on Machine Learning, pages 2291\u20132299,\n2016.\n\n[28] Xinyang Yi, Dohyung Park, Yudong Chen, and Constantine Caramanis. Fast algorithms for\nrobust pca via gradient descent. In Advances in neural information processing systems, pages\n4152\u20134160, 2016.\n\n[29] Karl Bringmann, Pavel Kolev, and David Woodruff. Approximation algorithms for l0-low rank\napproximation. In Advances in Neural Information Processing Systems, pages 6648\u20136659,\n2017.\n\n[30] Ricardo Otazo, Emmanuel Cand\u00e8s, and Daniel K Sodickson. Low-rank plus sparse matrix\ndecomposition for accelerated dynamic mri with separation of background and dynamic compo-\nnents. Magnetic Resonance in Medicine, 73(3):1125\u20131136, 2015.\n\n[31] Huan Xu, Constantine Caramanis, and Sujay Sanghavi. Robust pca via outlier pursuit. In\n\nAdvances in Neural Information Processing Systems, pages 2496\u20132504, 2010.\n\n[32] Javad Mashreghi. Representation theorems in Hardy spaces, volume 74. Cambridge University\n\nPress, 2009.\n\n[33] Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I Jordan. How to escape\n\nsaddle points ef\ufb01ciently. arXiv preprint arXiv:1703.00887, 2017.\n\n[34] Christos Boutsidis, Petros Drineas, and Malik Magdon-Ismail. Near-optimal column-based\n\nmatrix reconstruction. SIAM Journal on Computing, 43(2):687\u2013717, 2014.\n\n11\n\n\f[35] Yining Wang and Aarti Singh. Provably correct algorithms for matrix column subset selection\n\nwith selectively sampled data. arXiv preprint arXiv:1505.04343, 2015.\n\n[36] Zhao Song, David P Woodruff, and Peilin Zhong. Relative error tensor low rank approximation.\nIn Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages\n2772\u20132789. SIAM, 2019.\n\n[37] Zhao Song, David P Woodruff, and Peilin Zhong. Towards a zero-one law for entrywise low\n\nrank approximation. arXiv preprint arXiv:1811.01442, 2018.\n\n[38] Amit Deshpande, Madhur Tulsiani, and Nisheeth K Vishnoi. Algorithms and hardness for\nsubspace approximation. In Proceedings of the twenty-second annual ACM-SIAM symposium\non Discrete Algorithms, pages 482\u2013496. Society for Industrial and Applied Mathematics, 2011.\n[39] Ravindran Kannan, Santosh Vempala, et al. Spectral algorithms. Foundations and Trends R(cid:13) in\n\nTheoretical Computer Science, 4(3\u20134):157\u2013288, 2009.\n\n12\n\n\f", "award": [], "sourceid": 1451, "authors": [{"given_name": "Chen", "family_name": "Dan", "institution": "Carnegie Mellon University"}, {"given_name": "Hong", "family_name": "Wang", "institution": "Massachusetts Institute of Technology"}, {"given_name": "Hongyang", "family_name": "Zhang", "institution": "TTIC"}, {"given_name": "Yuchen", "family_name": "Zhou", "institution": "University of Wisconsin, Madison"}, {"given_name": "Pradeep", "family_name": "Ravikumar", "institution": "Carnegie Mellon University"}]}